Neural Magic Introduces Sparsity to MLPerf, Boosting CPU Performance 175x

Neural Magic Announces MLPerf Inference Benchmarks, Delivered Purely in Software Somerville, Massachusetts, September 8, 2022 - Neural Magic, the company leading a new software-delivered AI movement by bringing hyper-performant and scalable ML inferencing to commodity CPU infrastructure, announced today its benchmark results for three Natural Language Processing (NLP) models submitted to the MLPerf Inference Datacenter… Read More Neural Magic Introduces Sparsity to MLPerf, Boosting CPU Performance 175x

YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance and a Smaller Footprint

This YOLOv5 blog post was edited in September 2022 to reflect more-recent sparsification research, software updates, better performance numbers, and easier benchmarking and transfer learning flows. Prune and Quantize YOLOv5 for a 12x Increase in Performance and a 12x Decrease in Model Files Neural Magic improves YOLOv5 model performance on CPUs by using state-of-the-art pruning… Read More YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance and a Smaller Footprint

BERT-Large: Prune Once for DistilBERT Inference Performance

Compress BERT-Large with pruning and quantization to create a version that maintains accuracy while beating baseline DistilBERT performance and compression In 2019, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, a research paper from Google research, introduced two versions of a transformative new NLP model: BERT-base and BERT-Large. Both were transformer-based architectures pre-trained on… Read More BERT-Large: Prune Once for DistilBERT Inference Performance

ResNet-50 on CPUs: Sparsifying for Better Performance on CPUs

This blog post was edited in July 2022 to reflect more-recent sparsification research, software updates, better performance numbers, and easier benchmarking and transfer learning flows. In this post, we elaborate on how we sparsified ResNet-50 models up to 95% while retaining 99% of the baseline accuracy. Furthermore, we’ll show how we used these sparsified models… Read More ResNet-50 on CPUs: Sparsifying for Better Performance on CPUs

​​Come Build the Future with Labs by Neural Magic

Architect and deploy better machine learning solutions with industry experts Today, Neural Magic is debuting a new offering that allows you to deliver best-in-class ML solutions leveraging the same engineering talent behind our DeepSparse Engine. Labs by Neural Magic empowers organizations to define (or refine) their AI/ML best practices. Teams will develop their methodology, success… Read More ​​Come Build the Future with Labs by Neural Magic

Deploy Sparse DistilBERT with the DeepSparse Engine on AWS SageMaker for a 7x Increase in Performance

You can now automate the deployment of a sparse transformer model with an Amazon SageMaker endpoint. At Neural Magic, we have simplified the arduous task of infrastructure build (often requiring several steps to complete) by distilling it down to a single CLI command. This post describes the ease of building your personal SageMaker inference endpoint… Read More Deploy Sparse DistilBERT with the DeepSparse Engine on AWS SageMaker for a 7x Increase in Performance

Neural Magic at CVPR 2022

Are you heading to CVPR 2022 in New Orleans this June 19-23? So are we! And we’d love to meet you. Stop by booth #1223 and say hello. Who is Neural Magic? Passionate leaders with deep engineering backgrounds, Neural Magic has developed a sparsity-aware inference engine and open-source tools for maximizing the sparsity of neural… Read More Neural Magic at CVPR 2022

oBERT: Compound Sparsification Delivers Faster Accurate Models for NLP

GPU-Level Latency on CPUs With 10x Smaller Models using oBERT + DeepSparse The modern world is made up of constant communication happening through text. Think messaging apps, social networks, documentation and collaboration tools, or books. This communication generates enormous amounts of actionable data for companies that wish to use it to improve their users’ experiences.… Read More oBERT: Compound Sparsification Delivers Faster Accurate Models for NLP

Neural Magic Joins MLCommons to Help Accelerate ML Innovation Through Transparency and Open Source

Neural Magic Joins MLCommons  Through research, benchmarks, and best practices, Neural Magic is committed to open standards that will guide machine learning (ML) along the path from a research field to a mature industry. In February of 2021, we open-sourced our model sparsification libraries and made our sparsity-aware inference engine freely available for community use.… Read More Neural Magic Joins MLCommons to Help Accelerate ML Innovation Through Transparency and Open Source