Architect and deploy better machine learning solutions with industry experts Today, Neural Magic is debuting a new offering that allows you to deliver best-in-class ML solutions leveraging the same engineering talent behind our DeepSparse Engine. Labs by Neural Magic empowers organizations to define (or refine) their AI/ML best practices. Teams will develop their methodology, success… Read More Come Build the Future with Labs by Neural Magic
Category: Blog
Deploy Sparse DistilBERT with the DeepSparse Engine on AWS SageMaker for a 7x Increase in Performance
You can now automate the deployment of a sparse transformer model with an Amazon SageMaker endpoint. At Neural Magic, we have simplified the arduous task of infrastructure build (often requiring several steps to complete) by distilling it down to a single CLI command. This post describes the ease of building your personal SageMaker inference endpoint… Read More Deploy Sparse DistilBERT with the DeepSparse Engine on AWS SageMaker for a 7x Increase in Performance
Neural Magic Joins MLCommons to Help Accelerate ML Innovation Through Transparency and Open Source
Neural Magic Joins MLCommons Through research, benchmarks, and best practices, Neural Magic is committed to open standards that will guide machine learning (ML) along the path from a research field to a mature industry. In February of 2021, we open-sourced our model sparsification libraries and made our sparsity-aware inference engine freely available for community use.… Read More Neural Magic Joins MLCommons to Help Accelerate ML Innovation Through Transparency and Open Source
Video: Azure, AMD, and Neural Magic Raise the Bar for High-Performance Computing
Microsoft, AMD, and Neural Magic are raising the bar for high-performance computing. With a combination of HBv3 virtual machines and our sparsity-aware inference engine, we are able to run deep learning workloads on CPUs at speeds previously reserved only for GPUs. For example, together we deliver 5x inference speedup for BERT NLP models over other… Read More Video: Azure, AMD, and Neural Magic Raise the Bar for High-Performance Computing
Neural Magic CE 0.7, 0.8, and 0.9 Product Releases
The full technical release notes are always found within our GitHub release indexes linked from our Docs website or the specific Neural Magic repository. SparseZoo The latest additions to sparsezoo.neuralmagic.com! Sparse BERT mask language modeling models with example recipes for transferring to other downstream datasets Pruned-Quantized BERT models on SQuAD (Question Answering) YOLACT - for image segmentation DeepSparse Engine Optimization… Read More Neural Magic CE 0.7, 0.8, and 0.9 Product Releases
Sparsify Hugging Face BERT for Better CPU Performance & Smaller File Size
Get Started: Sparsify Hugging Face BERT Using Your Data Check out our previous blog post to learn about compound sparsification and how it enables faster and smaller BERT models: Pruning Hugging Face BERT: Using Compound Sparsification for Faster CPU Inference with Better Accuracy. Ready to sparsify Hugging Face BERT? You can replicate the performance and… Read More Sparsify Hugging Face BERT for Better CPU Performance & Smaller File Size
Neural Magic Announces $30 Million Series A Funding Led by NEA
Neural Magic, the AI company building a software platform for deep learning inference, today announced a $30 million Series A funding round led by existing investor NEA with participation from Andreessen Horowitz, Amdocs, Comcast Ventures, Pillar VC, and Ridgeline Ventures. This financing brings the company’s total amount raised to $50 million. The new capital will… Read More Neural Magic Announces $30 Million Series A Funding Led by NEA
Pruning Hugging Face BERT: Using Compound Sparsification for Faster CPU Inference with Better Accuracy
Pruning Hugging Face BERT: Apply both pruning and layer dropping sparsification methods to increase BERT performance anywhere from 3.3x to 14x on CPUs depending on accuracy constraints In this post, we go into detail on pruning Hugging Face BERT and describe how sparsification combined with the DeepSparse Engine improves BERT model performance on CPUs. We’ll… Read More Pruning Hugging Face BERT: Using Compound Sparsification for Faster CPU Inference with Better Accuracy
Neural Magic CE 0.5 and 0.6 Product Releases
Neural Magic has been busy this summer on the Community Edition (CE) of our DeepSparse tools; we’re excited to share highlights of releases 0.5 and 0.6. The full technical release notes are always found within our GitHub release indexes linked from our Docs website or the specific Neural Magic repository. For user help or questions… Read More Neural Magic CE 0.5 and 0.6 Product Releases
YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance and a Smaller Footprint
For YOLOv3, read our previous blog: YOLOv3 on CPUs: Sparsifying to Achieve GPU-Level Performance Prune and quantize YOLOv5 for a 10x increase in performance with 12x smaller model files. Neural Magic improves YOLOv5 model performance on CPUs by using state-of-the-art pruning and quantization techniques combined with the DeepSparse Engine. In this blog post, we'll cover our… Read More YOLOv5 on CPUs: Sparsifying to Achieve GPU-Level Performance and a Smaller Footprint