Products Archives - Neural Magic

Run a Medical Chatbot on CPUs With Sparse LLMs and DeepSparse

neuralmagic | 11/16/23

The AI space is abuzz with large language models (LLMs), but using them locally is a challenge due to their enormous size. Organizations that want to use these models for applications such as question answering must either invest in expensive cloud infrastructure or use closed-source models. By using closed-source models, companies also give up their… Read More

Navigating the Nuances of Text Generation: How to Control LLM Outputs With DeepSparse

neuralmagic | 11/09/23

In the burgeoning field of AI, large language models (LLMs) currently dominate the headlines, producing applications that span from writing assistance to conversational AI. The popularity of these models is driven by their ability to generate text that is not only coherent but also contextually relevant. Default LLM inference pipelines operate by choosing the next… Read More

Integrating DeepSparse With OpenAI’s API for Fast Local LLMs

neuralmagic | 11/02/23

Since OpenAI's introduction of ChatGPT, developers worldwide have widely embraced the OpenAI API as the go-to solution for making API requests to their language models. However, in response to the growing demand within open-source communities for more accessible and cost-effective language model alternatives, users have started to explore the integration of DeepSparse with OpenAI's API.… Read More

Building Sparse LLM Applications on CPUs With LangChain and DeepSparse

neuralmagic | 10/20/23

LangChain is one of the most exciting tools in Generative AI, with many interesting design paradigms for building large language model (LLM) applications. However, developers who use LangChain have to choose between expensive APIs or cumbersome GPUs to power LLMs in their chains. With Neural Magic, developers can accelerate their model on CPU hardware, to… Read More

Sparse Fine-Tuning for Accelerating Large Language Models with DeepSparse

neuralmagic | 10/12/23

The arrival of capable open-source large language models (LLMs) like MosaicML’s MPT and Meta’s Llama 2 has made it easier for enterprises to explore generative AI to address their business challenges. Yet, adoption of open-source models for commercial applications is still hampered by two key problems: In our recent paper, Sparse Fine-Tuning for Inference Acceleration… Read More

Optimal CPU AI Inference with AMD EPYC™ 8004 Series Processors and Neural Magic DeepSparse

neuralmagic | 09/22/23

As artificial intelligence (AI) and machine learning (ML) have become the backbone of technological innovation, companies race to provide the best solutions for businesses to increase optimization, efficiency, and scalability. Our founders launched Neural Magic so customers didn’t have to hit the same roadblocks they encountered, when it came to utilizing maximum hardware capabilities for… Read More

Latest MLPerf™ Inference v3.1 Results Show 50X Faster AI Inference for x86 and ARM from Neural Magic

neuralmagic | 09/13/23

MLPerf™ Inference v2.1 and v3.0 outcomes. What started out as a solution for research resource limitations, founders Nir Shavit and Alexander Matveev now provide a solution for customers that need a more cost efficient, yet still performant option, other than GPUs, to support deep learning initiatives. We are excited to share our latest contributions to… Read More

Unleash Software-Accelerated AI Inference with Neural Magic on HPE ProLiant Gen11 Servers Powered by 4th Gen AMD EPYC Processors

neuralmagic | 07/18/23

HPE ProLiant with AMD EPYC™ CPUs provide exceptional value to AI workloads, especially when combined with Neural Magic, to unlock incredible levels of performance for AI inference. Software-Accelerated AI Ever-larger machine learning models (ML) place ever-larger demands on hardware. Neural Magic helps alleviate hardware demands with a software-accelerated AI inference solution that delivers impressive ML… Read More

Scaling CPU Inference on AWS EKS with DeepSparse

neuralmagic | 07/11/23

Amazon Web Services (AWS) Elastic Kubernetes Service (EKS) and Neural Magic's DeepSparse work together seamlessly to provide customers with impactful deep learning inference for production environments. This paired solution offers an accessible and efficient alternative to default hardware choices often used in large-scale machine learning deployments. Advantages of integrating of these two technologies include improved… Read More

Neural Magic 1.5 Product Release

neuralmagic | 06/14/23

Here are highlights of the 1.5 product release of DeepSparse, SparseML, and SparseZoo libraries. The full technical release notes are always available within our GitHub release indexes linked from the specific Neural Magic repository. Join us in the Neural Magic Community Slack if you have any questions, need assistance, or simply want to introduce yourself. For… Read More