Are you attending this year’s virtual NeurIPS conference? The Neural Magic team would love to meet you.
Who is Neural Magic?
After years of research at MIT, our team concluded that throwing teraflops at dense models is not sustainable. So we’ve taken the best of known research on model compression (unstructured pruning and quantization, in particular) and efficient sparse execution to build software that delivers efficient deep neural network inference on everyday CPUs, without the need for specialized hardware.
Even though the format of this year’s NeurIPS is a little different, our team still has a jam-packed agenda. Here’s how to join us.
Expo Demonstration: Using Sparse Quantization for Efficient Inference on Deep Neural Networks
Today’s state of deep neural network inference can be summed up with two words: complex and inefficient. The quest for accuracy has led to deep neural networks that require heavy compute resources to solve tasks at hand. This results in unsustainable computational, economic, and environmental costs. According to a survey from earlier this year, putting extensive research on model compression into practice is often difficult for under-resourced teams to achieve.
Join Neural Magic ML experts to learn how we successfully applied published research on model compression and efficient sparse execution to built software that compresses and optimize deep learning models for efficient inference with ease. You’ll walk away with:
- SOTA model compression techniques;
- A demo of the first-ever general-purpose inference engine that translates high sparsity levels into significant speedup, and
- Next steps on using the Neural Magic Inference engine and ML tools to make your inference efficient, with less complexity.
Date: Sunday, December 6
Time: 12:00pm - 1:00pm PT (3:00pm - 4:00pm ET)
Neural Magic Demo(s): Optimize DL Models with Ease, for Free
We are excited to show you our model compression and sparse execution software in action, and answer any questions you might have. We’ll be doing so during the following times:
Date: Tuesday, December 08 (Session #1)
Time: 11:00am PT - 11:30am PT (2:00pm ET - 2:30pm ET)
Location: Zoom
Date: Thursday, December 10 (Session #1)
Time: 11:00am PT - 11:30am PT (2:00pm ET - 2:30pm ET)
Location: Zoom
Date: Thursday, December 10 (Session #2)
Time: 4:00pm PT - 4:30pm PT (7:00pm ET - 7:30pm ET)
Location: Zoom
Presenting our Research on Model Compression
Beyond our expo talks demos, our resident model compression expert, Dan Alistarh, will be presenting his team’s research in three core areas:
WoodFisher: Efficient Second-Order Approximation for Neural Network Compression
Tuesday, Dec. 08 | 09:00 AM -- 11:00 AM (PST) | Poster Session 1 #286 | View event details
Second-order information is a fundamental tool for solving optimization problems. Recently, there has been significant interest in utilizing this information in the context of deep neural networks; however, relatively little is known about the quality of existing approximations in this context. Our work examines this question, identifies issues with existing approaches, and proposes a method called WoodFisher to compute a faithful and efficient estimate of the inverse Hessian.
Adaptive Gradient Quantization for Data-Parallel SGD
Wednesday, Dec. 09 | 09:00 AM -- 11:00 AM (PST) | Poster Session 3 #842 | View event details
Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. This session will introduce two adaptive quantization schemes, ALQ and AMQ.
Scalable Belief Propagation via Relaxed Scheduling
Thursday Dec 10 | 09:00 AM -- 11:00 AM (PST) | Poster Session 5 #1522 | View event details
The ability to leverage large-scale hardware parallelism has been one of the key enablers of the accelerated recent progress in machine learning. Despite the wealth of knowledge on parallelization, some classic machine learning algorithms often prove hard to parallelize efficiently while maintaining convergence. In this session, we focus on efficient parallel algorithms for the key machine learning task of inference on graphical models.
Join us at NeurIPS
Check out the above sessions, and drop by and visit us at our virtual booth.
P.S. We’ll be giving a $5 donation back to multiple causes for every individual who stops by our booth and helps us understand the deep learning space by filling out our 2-minute survey.

Accomplishing our Mission
At the end of January 2021, Neural Magic plans to make portions of its software open source and made available on GitHub. This moves us closer to achieving our mission of shattering the hardware barriers holding back the field of machine learning by making the power of deep learning simple, accessible, and affordable for anyone.
Please fill out the form below to receive a one-time email communication when our engine and deep learning optimization software is ready for download.