|
Neural Magic has been busy this summer working on the Community Edition (CE) of our DeepSparse Engine and SparseML libraries; we’re excited to share highlights of releases 1.0 and 1.1. Our 1.0 release was a huge milestone and we could not have gotten here without all of your support and feedback! The full technical release… Read More Neural Magic CE 1.1 and 1.0 Product Releases
|
Compress BERT-Large with pruning and quantization to create a version that maintains accuracy while beating baseline DistilBERT performance and compression In 2019, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, a research paper from Google research, introduced two versions of a transformative new NLP model: BERT-base and BERT-Large. Both were transformer-based architectures pre-trained on… Read More BERT-Large: Prune Once for DistilBERT Inference Performance
|
This blog post was edited in July 2022 to reflect more-recent sparsification research, software updates, better performance numbers, and easier benchmarking and transfer learning flows. In this post, we elaborate on how we sparsified ResNet-50 models up to 95% while retaining 99% of the baseline accuracy. Furthermore, we’ll show how we used these sparsified models… Read More ResNet-50 on CPUs: Sparsifying for Better Performance on CPUs
|
You can now automate the deployment of a sparse transformer model with an Amazon SageMaker endpoint. At Neural Magic, we have simplified the arduous task of infrastructure build (often requiring several steps to complete) by distilling it down to a single CLI command. This post describes the ease of building your personal SageMaker inference endpoint… Read More Deploy Sparse DistilBERT with the DeepSparse Engine on AWS SageMaker for a 7x Increase in Performance
|
Are you heading to CVPR 2022 in New Orleans this June 19-23? So are we! And we’d love to meet you. Stop by booth #1223 and say hello. Who is Neural Magic? Passionate leaders with deep engineering backgrounds, Neural Magic has developed a sparsity-aware inference engine and open-source tools for maximizing the sparsity of neural… Read More Neural Magic at CVPR 2022
|
GPU-Level Latency on CPUs With 10x Smaller Models using oBERT + DeepSparse The modern world is made up of constant communication happening through text. Think messaging apps, social networks, documentation and collaboration tools, or books. This communication generates enormous amounts of actionable data for companies that wish to use it to improve their users’ experiences.… Read More oBERT: Compound Sparsification Delivers Faster Accurate Models for NLP
|
In alignment with AMD's latest launch, Neural Magic is pushing CPU-based neural network execution to new heights. Using only software and SOTA sparsification research, Neural Magic achieves a 3x relative speedup of inference performance for sparse BERT NLP and ResNet-50 image classification models, with a nearly 20-25% boost attributed to the L3 cache increase from… Read More Increasing Inference Performance with Sparsity and AMD Milan-X
|
The full technical release notes are always found within our GitHub release indexes linked from our Docs website or the specific Neural Magic repository. SparseZoo The latest additions to sparsezoo.neuralmagic.com! Sparse BERT mask language modeling models with example recipes for transferring to other downstream datasets  Pruned-Quantized BERT models on SQuAD (Question Answering)  YOLACT - for image segmentation  DeepSparse Engine Optimization… Read More Neural Magic CE 0.7, 0.8, and 0.9 Product Releases
|
Pruning Hugging Face BERT: Apply both pruning and layer dropping sparsification methods to increase BERT performance anywhere from 3.3x to 14x on CPUs depending on accuracy constraints In this post, we go into detail on pruning Hugging Face BERT and describe how sparsification combined with the DeepSparse Engine improves BERT model performance on CPUs. We’ll… Read More Pruning Hugging Face BERT: Using Compound Sparsification for Faster CPU Inference with Better Accuracy
|
Neural Magic has been busy this summer on the Community Edition (CE) of our DeepSparse tools; we’re excited to share highlights of releases 0.5 and 0.6. The full technical release notes are always found within our GitHub release indexes linked from our Docs website or the specific Neural Magic repository. For user help or questions… Read More Neural Magic CE 0.5 and 0.6 Product Releases