Neural Magic Joins MLCommons to Help Accelerate ML Innovation Through Transparency and Open Source


Neural Magic Joins MLCommons 

Through research, benchmarks, and best practices, Neural Magic is committed to open standards that will guide machine learning (ML) along the path from a research field to a mature industry. In February of 2021, we open-sourced our model sparsification libraries and made our sparsity-aware inference engine freely available for community use. The research coming from the ML community at large has been the driving force of our innovation. And today, Neural Magic joins MLCommons to emphasize our commitment to giving back to the community through open standards that will accelerate ML innovation and increase its positive impact on society. 

MLCommons is an open engineering consortium of 50+ global organizations whose aim is to accelerate and democratize ML to benefit everyone through:

  • Standard benchmarks to measure progress;
  • Public datasets to fuel research; and
  • Best practices to accelerate innovation and development. 

Together with MLCommons, we believe we can unlock the next stage of AI and ML adoption by participating in open and useful measures of quality and performance, accompanied by sharing of best practices and resources that speed up innovation through efficient CPU execution.

Democratizing Efficient CPU Execution

Neural Magic has developed a sparsity-aware inference engine and open-source tools for maximizing the sparsity of neural networks while preserving accuracy. By removing unnecessary hyperparameters and executing the network depthwise in cache (rather than the traditional layer-by-layer approach), we make Natural Language Processing and computer vision execution on CPUs faster and more efficient: 

  • BERT-base: 10x inference throughput improvement over ONNX Runtime, using both sparsity and quantization, targeting 99% baseline accuracy
  • ResNet-50: 8x inference throughput improvement over ONNX Runtime, using both sparsity and quantization, targeting 99% baseline accuracy

Through open-source tutorials and sparsification recipes, Neural Magic helps the community replicate the above performance with ease using their own data and/or their own models. We actively participate in research at large by contributing our technical innovations and incorporating the best open-source research practices into our engine and tools. 

Michael Goin, our Head of Product Engineering, will join the MLCommons’ working groups to better understand how we can contribute to the community. Michael will represent Neural Magic and will lead our MLPerf submission efforts.

Next Up: Submitting to MLPerf

Our goal is to join MLPerf to share our datacenter and edge inference benchmarks in the next round of submissions. It’s scheduled for August 2022. We look forward to seeing efficient CPU execution pave the road to true ML democratization.

See the Code on GitHub 

  • SparseML: A toolkit that includes APIs, CLIs, scripts, and libraries that apply SOTA sparsification algorithms such as pruning and quantization to any neural network.
  • DeepSparse: A neural network inference engine that delivers GPU-class performance for sparsified models on CPUs.
  • SparseZoo: A constantly-growing repository of sparsified (pruned and pruned-quantized) models with matching sparsification recipes for neural networks. See and download models here.

Further Resources

  • Try it Now Use Cases
    • Question Answering
    • Token Classification
    • Text Classification
    • Image Classification
    • Object Detection

Join the Deep Sparse Slack Community to interact with Neural Magic’s engineers, users, and developers interested in sparsification and accelerating deep learning inference performance.