Neural Magic is excited to release the next generation of Sparsify. Sparsify is an offering from Neural Magic that enables you to accelerate inference, without sacrificing accuracy. Using a simple web app and one-command API calls, you can apply state-of-the-art pruning, quantization, and distillation algorithms to neural networks.
Want to jump right in? Get started by creating an account.
Want to kick the tires? Read the Sparsify Quickstart Guide.
Have fun optimizing your own models and accelerating inference!
Today, we are very excited to provide you with early access to Sparsify, our automated model optimization tool!
As deep learning models continue to grow in size, deploying and running them performantly and accurately has required significant investments in flops and system resources. Take GPT-3 for example, with over 175 billion parameters, it takes nearly 350GB of memory and $12M just to train!
Enter sparsity.
Model sparsity is an emerging technique in the industry to reduce over-parameterized layers in a neural network, ultimately compressing the network and driving performance gains.
Unfortunately, techniques and tools to take advantage of a models natural sparsity are either in their infancy, require unforeseen trade-offs between accuracy and performance, or require significant skills and investment in state-of-the-art industry research.
With Sparsify, we are creating an automated tool to generate sparse deep learning networks, guided by a visual interface to analyze model performance, simplify performance and accuracy trade-offs, and integrate into existing training pipelines with a few lines of code. State-of-the-art algorithms and techniques are wrapped with recipes designed for data scientists of all skill levels for ease-of-use and time to value.
What’s more, we have open sourced it! See how we do it, contribute back, and help us build the sparse model community!
These sparse models can then be deployed into your CPU or GPU pipeline of choice. We also offer our inference engine which takes sparse models and builds an optimized execution plan for them on CPUs. We have made significant progress and can generate GPU-class performance with the flexibility to run on CPUs in the cloud, datacenter, or embedded devices.
We are excited to have you try Sparsify! Check out the docs to learn more. We would love to hear about your experience and set up a deeper conversation. You can reach out directly to me or the team at [email protected], and are also welcome to file any issues directly in GitHub.
Here are several Sparsify screenshots:


