If you have a text classification task at hand, exploring the zero-shot learning approach is a no-brainer. Zero-shot enables you to classify text without the need for model retraining, making it easier and faster to get started. However, zero-shot is very compute-intensive given it needs to infer each candidate label. Enter sparsity to save the… Read More Faster Zero-Shot Learning with Sparsity
Neural Magic Announces MLPerf Inference Benchmarks, Delivered Purely in Software Somerville, Massachusetts, September 8, 2022 - Neural Magic, the company leading a new software-delivered AI movement by bringing hyper-performant and scalable ML inferencing to commodity CPU infrastructure, announced today its benchmark results for three Natural Language Processing (NLP) models submitted to the MLPerf Inference Datacenter… Read More Neural Magic Introduces Sparsity to MLPerf, Boosting CPU Performance 175x
This YOLOv5 blog post was edited in September 2022 to reflect more-recent sparsification research, software updates, better performance numbers, and easier benchmarking and transfer learning flows. Prune and Quantize YOLOv5 for a 12x Increase in Performance and a 12x Decrease in Model Files Neural Magic improves YOLOv5 model performance on CPUs by using state-of-the-art pruning… Read More YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance and a Smaller Footprint
Neural Magic has been busy this summer working on the Community Edition (CE) of our DeepSparse Engine and SparseML libraries; we’re excited to share highlights of releases 1.0 and 1.1. Our 1.0 release was a huge milestone and we could not have gotten here without all of your support and feedback! The full technical release… Read More Neural Magic CE 1.1 and 1.0 Product Releases
Compress BERT-Large with pruning and quantization to create a version that maintains accuracy while beating baseline DistilBERT performance and compression In 2019, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, a research paper from Google research, introduced two versions of a transformative new NLP model: BERT-base and BERT-Large. Both were transformer-based architectures pre-trained on… Read More BERT-Large: Prune Once for DistilBERT Inference Performance
This blog post was edited in July 2022 to reflect more-recent sparsification research, software updates, better performance numbers, and easier benchmarking and transfer learning flows. In this post, we elaborate on how we sparsified ResNet-50 models up to 95% while retaining 99% of the baseline accuracy. Furthermore, we’ll show how we used these sparsified models… Read More ResNet-50 on CPUs: Sparsifying for Better Performance on CPUs
You can now automate the deployment of a sparse transformer model with an Amazon SageMaker endpoint. At Neural Magic, we have simplified the arduous task of infrastructure build (often requiring several steps to complete) by distilling it down to a single CLI command. This post describes the ease of building your personal SageMaker inference endpoint… Read More Deploy Sparse DistilBERT with the DeepSparse Engine on AWS SageMaker for a 7x Increase in Performance
Are you heading to CVPR 2022 in New Orleans this June 19-23? So are we! And we’d love to meet you. Stop by booth #1223 and say hello. Who is Neural Magic? Passionate leaders with deep engineering backgrounds, Neural Magic has developed a sparsity-aware inference engine and open-source tools for maximizing the sparsity of neural… Read More Neural Magic at CVPR 2022
GPU-Level Latency on CPUs With 10x Smaller Models using oBERT + DeepSparse The modern world is made up of constant communication happening through text. Think messaging apps, social networks, documentation and collaboration tools, or books. This communication generates enormous amounts of actionable data for companies that wish to use it to improve their users’ experiences.… Read More oBERT: Compound Sparsification Delivers Faster Accurate Models for NLP
In alignment with AMD's latest launch, Neural Magic is pushing CPU-based neural network execution to new heights. Using only software and SOTA sparsification research, Neural Magic achieves a 3x relative speedup of inference performance for sparse BERT NLP and ResNet-50 image classification models, with a nearly 20-25% boost attributed to the L3 cache increase from… Read More Increasing Inference Performance with Sparsity and AMD Milan-X