Products Archives - Neural Magic

Build Efficient Vector Search on CPUs with Neural Magic and Weaviate

neuralmagic | 10/26/22

We are excited to share a recent conversation between Neural Magic and Weaviate. We've been collaborating with Weaviate in an effort to help companies scale machine learning efforts to enterprise-grade production, as many business use cases require robust ML pipelines through information retrieval, semantic search, image similarity search, recommendations, classification, and more. Weaviate is a… Read More

Faster Zero-Shot Learning with Sparsity

neuralmagic | 10/05/22

If you have a text classification task at hand, exploring the zero-shot learning approach is a no-brainer. Zero-shot enables you to classify text without the need for model retraining, making it easier and faster to get started. However, zero-shot is very compute-intensive given it needs to infer each candidate label. Enter sparsity to save the… Read More

Neural Magic Introduces Sparsity to MLPerf, Boosting CPU Performance 175x

neuralmagic | 09/08/22

Neural Magic Announces MLPerf Inference Benchmarks, Delivered Purely in Software Somerville, Massachusetts, September 8, 2022 - Neural Magic, the company leading a new software-delivered AI movement by bringing hyper-performant and scalable ML inferencing to commodity CPU infrastructure, announced today its benchmark results for three Natural Language Processing (NLP) models submitted to the MLPerf Inference Datacenter… Read More

YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance and a Smaller Footprint

neuralmagic | 09/07/22

This YOLOv5 blog post was edited in September 2022 to reflect more-recent sparsification research, software updates, better performance numbers, and easier benchmarking and transfer learning flows. Prune and Quantize YOLOv5 for a 12x Increase in Performance and a 12x Decrease in Model Files Neural Magic improves YOLOv5 model performance on CPUs by using state-of-the-art pruning… Read More

Neural Magic CE 1.1 and 1.0 Product Releases

neuralmagic | 09/01/22

Neural Magic has been busy this summer working on the Community Edition (CE) of our DeepSparse Engine and SparseML libraries; we’re excited to share highlights of releases 1.0 and 1.1. Our 1.0 release was a huge milestone and we could not have gotten here without all of your support and feedback! The full technical release… Read More

BERT-Large: Prune Once for DistilBERT Inference Performance

neuralmagic | 07/15/22

Compress BERT-Large with pruning and quantization to create a version that maintains accuracy while beating baseline DistilBERT performance and compression In 2019, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, a research paper from Google research, introduced two versions of a transformative new NLP model: BERT-base and BERT-Large. Both were transformer-based architectures pre-trained on… Read More

ResNet-50 on CPUs: Sparsifying for Better Performance on CPUs

neuralmagic | 07/11/22

This blog post was edited in July 2022 to reflect more-recent sparsification research, software updates, better performance numbers, and easier benchmarking and transfer learning flows. In this post, we elaborate on how we sparsified ResNet-50 models up to 95% while retaining 99% of the baseline accuracy. Furthermore, we’ll show how we used these sparsified models… Read More

Deploy Sparse DistilBERT with the DeepSparse Engine on AWS SageMaker for a 7x Increase in Performance

neuralmagic | 06/16/22

You can now automate the deployment of a sparse transformer model with an Amazon SageMaker endpoint. At Neural Magic, we have simplified the arduous task of infrastructure build (often requiring several steps to complete) by distilling it down to a single CLI command. This post describes the ease of building your personal SageMaker inference endpoint… Read More

Neural Magic at CVPR 2022

neuralmagic | 06/02/22

Are you heading to CVPR 2022 in New Orleans this June 19-23? So are we! And we’d love to meet you. Stop by booth #1223 and say hello. Who is Neural Magic? Passionate leaders with deep engineering backgrounds, Neural Magic has developed a sparsity-aware inference engine and open-source tools for maximizing the sparsity of neural… Read More

oBERT: Compound Sparsification Delivers Faster Accurate Models for NLP

neuralmagic | 05/20/22

GPU-Level Latency on CPUs With 10x Smaller Models using oBERT + DeepSparse The modern world is made up of constant communication happening through text. Think messaging apps, social networks, documentation and collaboration tools, or books. This communication generates enormous amounts of actionable data for companies that wish to use it to improve their users’ experiences.… Read More