Pruning Hugging Face BERT: Using Compound Sparsification for Faster CPU Inference with Better Accuracy

Pruning Hugging Face BERT: Apply both pruning and layer dropping sparsification methods to increase BERT performance anywhere from 3.3x to 14x on CPUs depending on accuracy constraints In this post, we go into detail on pruning Hugging Face BERT and describe how sparsification combined with the DeepSparse Engine improves BERT model performance on CPUs. We’ll… Read More Pruning Hugging Face BERT: Using Compound Sparsification for Faster CPU Inference with Better Accuracy

YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance and a Smaller Footprint

Prune and quantize YOLOv5 for a 10x increase in performance with 12x smaller model files. Neural Magic improves YOLOv5 model performance on CPUs by using state-of-the-art pruning and quantization techniques combined with the DeepSparse Engine. In this blog post, we’ll cover our general methodology and demonstrate how to: Leverage the Ultralytics YOLOv5 repository with SparseML’s sparsification… Read More YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance and a Smaller Footprint

New Tutorial: Sparsifying YOLOv3 Using Recipes

Sparsifying YOLOv3 (or any other model) involves removing redundant information from neural networks using algorithms such as pruning and quantization, among others. This sparsification process results in many benefits for deployment environments, including faster inference and smaller file sizes. Unfortunately, many have not realized the benefits due to the complicated process and number of hyperparameters… Read More New Tutorial: Sparsifying YOLOv3 Using Recipes

Neural Magic Appoints Brian Stevens as Chief Executive Officer

We are excited to announce that industry veteran Brian Stevens will be joining Neural Magic as Chief Executive Officer. Brian brings vast experience in open source, enterprise, and hyper-scale cloud to the team. Before joining Neural Magic, Brian was Vice President and CTO of Google Cloud and Executive Vice President and CTO of Red Hat.… Read More Neural Magic Appoints Brian Stevens as Chief Executive Officer

YOLOv3 on CPUs: Sparsifying to Achieve GPU-Level Performance

Use CPUs to decrease costs and increase deployment flexibility while still achieving GPU-class performance. In this post, we elaborate on how we used state-of-the-art pruning and quantization techniques to improve the performance of the YOLOv3 on CPUs. We’ll show that by leveraging the robust YOLO training framework from Ultralytics with SparseML’s sparsification recipes it is… Read More YOLOv3 on CPUs: Sparsifying to Achieve GPU-Level Performance

ResNet-50 on CPUs: Sparsifying for Better Performance on CPUs

In this post, we elaborate on how we measured, on commodity cloud hardware, the throughput and latency of five ResNet-50 v1 models optimized for CPU inference. By the end of the post, you should be able reproduce these benchmarks using tools available in the Neural Magic GitHub repo, ultimately achieving better performance for ResNet-50 on CPUs.… Read More ResNet-50 on CPUs: Sparsifying for Better Performance on CPUs

Accelerating Machine Learning Inference on CPU with VMware vSphere and Neural Magic

This blog was originally posted by Na Zhang on VMware’s Office of the CTO Blog. You can see the original copy here. Increasingly large deep learning (DL) models require a significant amount of computing, memory, and energy, all of which become a bottleneck in real-time inference where resources are limited. In this post, we detail our… Read More Accelerating Machine Learning Inference on CPU with VMware vSphere and Neural Magic

Sparsify is Open Sourced – Try it Now

Today, we are very excited to provide you with early access to Sparsify, our automated model optimization tool! As deep learning models continue to grow in size, deploying and running them performantly and accurately has required significant investments in flops and system resources. Take GPT-3 for example, with over 175 billion parameters, it takes nearly… Read More Sparsify is Open Sourced – Try it Now