Hello

Neural Magic Introduces Sparsity to MLPerf, Boosting CPU Performance 175x

Neural Magic Announces MLPerf Inference Benchmarks, Delivered Purely in Software Somerville, Massachusetts, September 8, 2022 - Neural Magic, the company leading a new software-delivered AI movement by bringing hyper-performant and scalable ML inferencing to commodity CPU infrastructure, announced today its benchmark results for three Natural Language Processing (NLP) models submitted to the MLPerf Inference Datacenter… Read More Neural Magic Introduces Sparsity to MLPerf, Boosting CPU Performance 175x

ResNet-50 on CPUs: Sparsifying for Better Performance on CPUs

This blog post was edited in July 2022 to reflect more-recent sparsification research, software updates, better performance numbers, and easier benchmarking and transfer learning flows. In this post, we elaborate on how we sparsified ResNet-50 models up to 95% while retaining 99% of the baseline accuracy. Furthermore, we’ll show how we used these sparsified models… Read More ResNet-50 on CPUs: Sparsifying for Better Performance on CPUs

Neural Magic at CVPR 2022

Are you heading to CVPR 2022 in New Orleans this June 19-23? So are we! And we’d love to meet you. Stop by booth #1223 and say hello. Who is Neural Magic? Passionate leaders with deep engineering backgrounds, Neural Magic has developed a sparsity-aware inference engine and open-source tools for maximizing the sparsity of neural… Read More Neural Magic at CVPR 2022

oBERT: Compound Sparsification Delivers Faster Accurate Models for NLP

GPU-Level Latency on CPUs With 10x Smaller Models using oBERT + DeepSparse The modern world is made up of constant communication happening through text. Think messaging apps, social networks, documentation and collaboration tools, or books. This communication generates enormous amounts of actionable data for companies that wish to use it to improve their users’ experiences.… Read More oBERT: Compound Sparsification Delivers Faster Accurate Models for NLP

Increasing Inference Performance with Sparsity and AMD Milan-X

In alignment with AMD's latest launch, Neural Magic is pushing CPU-based neural network execution to new heights. Using only software and SOTA sparsification research, Neural Magic achieves a 3x relative speedup of inference performance for sparse BERT NLP and ResNet-50 image classification models, with a nearly 20-25% boost attributed to the L3 cache increase from… Read More Increasing Inference Performance with Sparsity and AMD Milan-X

Accelerating Machine Learning Inference on CPU with VMware vSphere and Neural Magic

This blog was originally posted by Na Zhang on VMware's Office of the CTO Blog. You can see the original copy here. Increasingly large deep learning (DL) models require a significant amount of computing, memory, and energy, all of which become a bottleneck in real-time inference where resources are limited. In this post, we detail our… Read More Accelerating Machine Learning Inference on CPU with VMware vSphere and Neural Magic

Sparsify is Open Sourced - Try it Now

Today, we are very excited to provide you with early access to Sparsify, our automated model optimization tool! As deep learning models continue to grow in size, deploying and running them performantly and accurately has required significant investments in flops and system resources. Take GPT-3 for example, with over 175 billion parameters, it takes nearly… Read More Sparsify is Open Sourced - Try it Now

Product Release Notes

Release 0.1.0 for the Community! February 4, 2021 As of February 2021, our products have been renamed, most have been open sourced and their release notes can be be found in GitHub! Sparsify SparseML (formerly Neural Magic ML Tooling) SparseZoo (formerly Neural Magic Model Repo) DeepSparse Engine (formerly Neural Magic Inference Engine) Release 1.4.0 January… Read More Product Release Notes

Neural Magic at NeurIPS 2020

Are you attending this year’s virtual NeurIPS conference? The Neural Magic team would love to meet you.  Who is Neural Magic?  After years of research at MIT, our team concluded that throwing teraflops at dense models is not sustainable. So we’ve taken the best of known research on model compression (unstructured pruning and quantization, in… Read More Neural Magic at NeurIPS 2020