Use CPUs to decrease costs and increase deployment flexibility while still achieving GPU-class performance. In this post, we elaborate on how we used state-of-the-art pruning and quantization techniques to improve the performance of the YOLOv3 on CPUs. We’ll show that by leveraging the robust YOLO training framework from Ultralytics with SparseML’s sparsification recipes it is… Read More YOLOv3 on CPUs: Sparsifying to Achieve GPU-Level Performance
Category: Blog
How Neural Magic’s Deep Sparse Technology Works
To understand how Neural Magic’s Deep Sparse technology works, it’s important to quickly cover the journey of our founders. While mapping the neural connections in the brain at MIT, Neural Magic’s founders Nir Shavit and Alexander Matveev were frustrated with the many limitations imposed by GPUs. Along the way, they stopped to ask themselves a… Read More How Neural Magic’s Deep Sparse Technology Works
Neural Magic 1.4 Product Release
We are excited to announce the Neural Magic 1.4 product release. This milestone contains new product features, an improved user experience, and stability enhancements that will simplify the ability for our clients to achieve GPU-class performance on commodity CPUs. NEW – Introducing Sparsify BETA Experience driven tooling to simplify the process of analyzing and optimizing… Read More Neural Magic 1.4 Product Release
Using Sparse-Quantization in Inference: NeurIPS 2020
Did you know that most weights in a neural network are actually useless? In other words, most weights can be removed with little to no impact on the loss. But, how and why would you optimize a deep learning model in practice? Through a combination of pruning and quantization (or “sparse-quantization”) you can drastically improve… Read More Using Sparse-Quantization in Inference: NeurIPS 2020
Neural Magic 1.2 Product Release
We are excited to announce the Neural Magic 1.2 product release. This product milestone contains new feature updates, an improved user experience, and stability enhancements that will simplify the ability for our clients to achieve price performance on commodity CPUs. Neural Magic Inference Engine Enables clients to run mission critical deep learning models on commodity… Read More Neural Magic 1.2 Product Release
Speeding Up Memory-Bound Object Detection Models: MobileNetV2_SSD
TL;DR: Learn more about increasing performance for MobileNetV2_SSD models, via pruning and decreasing post-production time. Read time: 3 minutes, 15 seconds In many object detection scenarios, there’s not a moment to lose. A fraction of a second can mean the difference between a self-driving car hitting a dog crossing the street or narrowly missing it.… Read More Speeding Up Memory-Bound Object Detection Models: MobileNetV2_SSD
Part 4: Sparsity per Layer Hyperparameter
TL;DR: In addition to the general hyperparameters described in the previous post, the sparsity to target per layer is arguably the most critical hyperparameter you can set. Below we give you the reason why, and show you how. Reading time: 10 minutes, 47 seconds Welcome to Part 4 in Neural Magic’s five-part blog series on… Read More Part 4: Sparsity per Layer Hyperparameter
Machine Learning Engineer Spotlight: Mani Sarkar
In our new blog series, we’re interviewing data scientists and machine learning engineers about their career paths, areas of interest and thoughts on the future of AI. We kick off this week with a 20-year veteran and jack-of-all-trades when it comes to machine learning and data science: Mani Sarkar. Mani is a strategic machine learning… Read More Machine Learning Engineer Spotlight: Mani Sarkar
Part 2: An Intro to Gradual Magnitude Pruning (GMP)
TL;DR: Gradual Magnitude Pruning (GMP) is one of the best pruning approaches to use due to its simplicity, ease of use, and performance on a wide variety of models. There are three general stages to GMP: stabilization, pruning, and fine-tuning. Reading time: 5 minutes, 6 seconds Welcome to Part 2 in Neural Magic’s five-part blog… Read More Part 2: An Intro to Gradual Magnitude Pruning (GMP)
Neural Magic 1.1 Product Release
We are excited to announce the Neural Magic 1.1 product release. This product milestone contains new feature updates, an improved user experience, and stability enhancements that will simplify the ability for our clients to achieve GPU-class performance on commodity CPUs. Neural Magic Inference Engine Enables clients to run mission critical deep learning models on commodity… Read More Neural Magic 1.1 Product Release