Products Archives - Neural Magic

Advancing AI Inference Density with Neural Magic and AMD

neuralmagic | 06/13/23

In an era where the hunger for data is driving an exponential surge in computational demand, organizations are realizing the need for power-efficient commodity compute to support artificial intelligence (AI) workloads. This is the basis of our work at Neural Magic, as we help customers optimize IT infrastructure to support AI projects and to drive… Read More

Speed up your LLMs with SparseGPT and DeepSparse on CPUs

neuralmagic | 06/08/23

Neural Magic has added support for large language models (LLMs) in DeepSparse, enabling inference speed-ups from compression techniques like SparseGPT on commodity CPUs. SparseGPT: Prune and Quantize LLMs Quickly With One-Shot State-of-the art language models are very large with parameter counts in the billions. To deploy one is expensive and often requires multiple GPUs just… Read More

Build Scalable NLP and Computer Vision Pipelines With DeepSparse - Now Available From the Google Cloud Marketplace

neuralmagic | 06/06/23

This is the second entry in our Google Cloud blog series. We recently launched our DeepSparse Inference Runtime on the Google Cloud Marketplace, to make it easy for ML practitioners to deploy their models at the click of a few buttons. Latency, accuracy, and inference costs are all critical when deploying natural language processing (NLP)… Read More

Neural Magic’s DeepSparse Inference Runtime Now Available in the Google Cloud Marketplace

neuralmagic | 05/24/23

Neural Magic's DeepSparse Inference Runtime can now be deployed directly from the Google Cloud Marketplace. DeepSparse supports various machine types on Google Cloud, so you can quickly deploy the infrastructure that works best for your use case, based on cost and performance. In this blog post, we will illustrate how easy it is to get… Read More

Deploy Serverless Machine Learning Inference on AWS with DeepSparse

neuralmagic | 05/18/23

This blog, originally posted in December 2022, has been edited in May 2023 to reflect updates made to the "Batch Deployment Flow" section and GitHub repo links. Leveraging the advantages of serverless computing, developers can deploy and manage AI-driven applications with unprecedented efficiency, scalability, and cost-effectiveness. With serverless deployments, machine learning inference can execute two… Read More

How to Achieve Up to 3X AI Speedup on DigitalOcean's Premium CPUs

neuralmagic | 05/10/23

As organizations continue to explore and invest in AI to advance their productivity and bottom lines, it's become clear that deploying models can put quite a dent in IT budgets. Organizations want the option to use cost-efficient commodity CPUs to support AI development. One option for companies to consider is DigitalOcean's Premium CPU-Optimized Droplets, launched… Read More

Detecting Small Objects on High-Resolution Images With SAHI and DeepSparse

neuralmagic | 05/02/23

With conventional object detection models, it can be challenging to identify small objects due to the limited number of pixels they occupy in the overall image. To help with this issue, you can use a technique like, Slicing Aided Hyper Inference (SAHI), which works on top of object detection models to discover small objects without… Read More

Neural Magic Scales up MLPerf™ Inference v3.0 Performance With Demonstrated Power Efficiency; No GPUs Needed

neuralmagic | 04/05/23

Six months ago, Neural Magic shared remarkable MLPerf results, with a 175X increase in CPU performance, attained using sparsity. This breakthrough was achieved exclusively with software, using sparsity-aware inferencing techniques. The impressive outcomes showcased the potential of network sparsity to enhance the performance of machine learning models on readily available CPUs. This advancement empowers individuals… Read More

Deploy Optimized Hugging Face Models With DeepSparse and SparseZoo

neuralmagic | 03/28/23

Pre-trained computer vision (CV) and natural language processing (NLP) models yield high accuracy in real-world applications but have low latency and throughput due to their large size. The models are also difficult and expensive to deploy. The problem is solved by reducing the models' size through pruning and reducing the precision of the weights through… Read More

Sparsify Image Classification Models Faster with SparseML and Deep Lake

neuralmagic | 03/14/23

Training time is a well-known problem when training computer vision networks such as image classification models. The problem is aggravated by the fact that image data and models are large, therefore requiring a lot of computational resources. Traditionally, these problems have been solved using powerful GPUs to load the data faster. Unfortunately, these GPUs are… Read More