Category: Uncategorized
Pre-trained computer vision (CV) and natural language processing (NLP) models yield high accuracy in real-world applications but have low latency and throughput due to their large size. The models are also difficult and expensive to deploy. The problem is solved by reducing the models' size through pruning and reducing the precision of the weights through… Read More Deploy Optimized Hugging Face Models With DeepSparse and SparseZoo
Large language models (LLMs) solve natural language processing problems with astounding accuracy. However, these models are enormous and require a lot of space, cost, and computation power to deploy. For example, the GPT-175B model has 175 billion parameters requiring 320GB of storage and at least 5 A100 GPUs with 80GB of memory each for inference.… Read More SparseGPT: Remove 100 Billion Parameters for Free
Training time is a well-known problem when training computer vision networks such as image classification models. The problem is aggravated by the fact that image data and models are large, therefore requiring a lot of computational resources. Traditionally, these problems have been solved using powerful GPUs to load the data faster. Unfortunately, these GPUs are… Read More Sparsify Image Classification Models Faster with SparseML and Deep Lake
This is the final entry in our AWS-centric blog series leading up to the AWS Startup Showcase on Thursday, March 9th. We are excited to be a part of this event with other selected visionary AI startups to talk about the future of deploying AI into production at scale. Sign up here to register for… Read More Bringing Software-Delivered AI to the AWS Marketplace
Neural Magic Announces MLPerf Inference Benchmarks, Delivered Purely in Software Somerville, Massachusetts, September 8, 2022 - Neural Magic, the company leading a new software-delivered AI movement by bringing hyper-performant and scalable ML inferencing to commodity CPU infrastructure, announced today its benchmark results for three Natural Language Processing (NLP) models submitted to the MLPerf Inference Datacenter… Read More Neural Magic Introduces Sparsity to MLPerf, Boosting CPU Performance 175x
Neural Magic has been busy this summer working on the Community Edition (CE) of our DeepSparse Engine and SparseML libraries; we’re excited to share highlights of releases 1.0 and 1.1. Our 1.0 release was a huge milestone and we could not have gotten here without all of your support and feedback! The full technical release… Read More Neural Magic CE 1.1 and 1.0 Product Releases
This blog post was edited in July 2022 to reflect more-recent sparsification research, software updates, better performance numbers, and easier benchmarking and transfer learning flows. In this post, we elaborate on how we sparsified ResNet-50 models up to 95% while retaining 99% of the baseline accuracy. Furthermore, we’ll show how we used these sparsified models… Read More ResNet-50 on CPUs: Sparsifying for Better Performance on CPUs
Are you heading to CVPR 2022 in New Orleans this June 19-23? So are we! And we’d love to meet you. Stop by booth #1223 and say hello. Who is Neural Magic? Passionate leaders with deep engineering backgrounds, Neural Magic has developed a sparsity-aware inference engine and open-source tools for maximizing the sparsity of neural… Read More Neural Magic at CVPR 2022
GPU-Level Latency on CPUs With 10x Smaller Models using oBERT + DeepSparse The modern world is made up of constant communication happening through text. Think messaging apps, social networks, documentation and collaboration tools, or books. This communication generates enormous amounts of actionable data for companies that wish to use it to improve their users’ experiences.… Read More oBERT: Compound Sparsification Delivers Faster Accurate Models for NLP
In alignment with AMD's latest launch, Neural Magic is pushing CPU-based neural network execution to new heights. Using only software and SOTA sparsification research, Neural Magic achieves a 3x relative speedup of inference performance for sparse BERT NLP and ResNet-50 image classification models, with a nearly 20-25% boost attributed to the L3 cache increase from… Read More Increasing Inference Performance with Sparsity and AMD Milan-X