Previously Recorded Videos

Workshop: How to Achieve the Fastest CPU Inference Performance for Object Detection YOLO Models (July, 2022)

Discover how to optimize your computer vision models, apply your own data with a few lines of code, and deploy it on commodity CPUs at GPU-level speeds. All with open source and free tools!


Workshop: How to Optimize Deep Learning Models for Production (May, 2022)

Learn about the benefits and downsides of pruning, 2022's most useful algorithms and tools that make pruning easy, and guaranteed ways to get production performance out of a pruned model.


How to Compress BERT NLP Models for Efficient Inference (April, 2022)

Learn about SOTA research about compressing BERT models 10x for much more efficient deployments and a 9-29x CPU inference speedup.


Deep Sparse Platform Demo: Build and Deploy Accurate Deep Learning Models Faster

Get an overview of Neural Magic, along with key business use cases and applications that can be powered with the Deep Sparse Platform. See a demo of an end-to-end experience in action, starting from a Neural Magic pre-trained model in the SparseZoo, applying a private dataset with a recipe using SparseML, and deploying on CPUs with the DeepSparse Engine.


Using "Compound Sparsification" with Hugging Face BERT for Faster CPU Inference with Better Accuracy

Learn what "compound sparsification" is, how we used it to accelerate Hugging Face BERT performance on CPUs by up to 14x, and how you can do the same with your private data.


Sparsifying YOLOv5 to Achieve Faster and Smaller Models

Learn how we sparsified (pruned and quantized) YOLOv5 for 10x better performance and 12x smaller model files. And how you can do the same with your private data.


Using Sparsification Recipes with PyTorch

Sparsification recipes make model pruning and quantization simple. This video shows what sparsification recipes are and how to use them to prune and quantize PyTorch models for smaller size and better performance.


Introducing the Deep Sparse Platform

To help the developer community interested in accelerating machine learning performance, we’ve open sourced our automated, recipe-driven model optimization technologies and made our CPU inference engine available for free. See our webinar recording to learn about the deep learning sparsification components that you can take advantage of immediately.


Big Brain Burnout: What's Wrong with AI Computing?

Hear from Neural Magic's award-winning co-founder why we need to fundamentally rethink how we’re building products that rely on machine learning and AI. Hint: Because if our brains processed information the same way today’s machine learning products consume computing power, you could fry an egg on your head. Spoiler: It's about memory, not raw compute.


Was this article helpful?
YesNo