Detecting Small Objects on High-Resolution Images With SAHI and DeepSparse

05/02/23
With conventional object detection models, it can be challenging to identify small objects due to the limited number of pixels they occupy in the overall image. To help with this issue, you can use a technique like, Slicing Aided Hyper Inference (SAHI), which works on top of object detection models to discover small objects without… Read More Detecting Small Objects on High-Resolution Images With SAHI and DeepSparse

Neural Magic Scales up MLPerf™ Inference v3.0 Performance With Demonstrated Power Efficiency; No GPUs Needed

04/05/23
Six months ago, Neural Magic shared remarkable MLPerf results, with a 175X increase in CPU performance, attained using sparsity. This breakthrough was achieved exclusively with software, using sparsity-aware inferencing techniques. The impressive outcomes showcased the potential of network sparsity to enhance the performance of machine learning models on readily available CPUs. This advancement empowers individuals… Read More Neural Magic Scales up MLPerf™ Inference v3.0 Performance With Demonstrated Power Efficiency; No GPUs Needed

Deploy Optimized Hugging Face Models With DeepSparse and SparseZoo

03/28/23
Pre-trained computer vision (CV) and natural language processing (NLP) models yield high accuracy in real-world applications but have low latency and throughput due to their large size. The models are also difficult and expensive to deploy. The problem is solved by reducing the models' size through pruning and reducing the precision of the weights through… Read More Deploy Optimized Hugging Face Models With DeepSparse and SparseZoo

SparseGPT: Remove 100 Billion Parameters for Free

03/21/23
Large language models (LLMs) solve natural language processing problems with astounding accuracy. However, these models are enormous and require a lot of space, cost, and computation power to deploy. For example, the GPT-175B model has 175 billion parameters requiring 320GB of storage and at least 5 A100 GPUs with 80GB of memory each for inference.… Read More SparseGPT: Remove 100 Billion Parameters for Free

Sparsify Image Classification Models Faster with SparseML and Deep Lake

03/14/23
Training time is a well-known problem when training computer vision networks such as image classification models. The problem is aggravated by the fact that image data and models are large, therefore requiring a lot of computational resources. Traditionally, these problems have been solved using powerful GPUs to load the data faster.  Unfortunately, these GPUs are… Read More Sparsify Image Classification Models Faster with SparseML and Deep Lake

Bringing Software-Delivered AI to the AWS Marketplace (Part 3 of 3-Blog Series)

03/07/23
This is the final entry in our AWS-centric blog series leading up to the AWS Startup Showcase on Thursday, March 9th. We are excited to be a part of this event with other selected visionary AI startups to talk about the future of deploying AI into production at scale. Sign up here to register for… Read More Bringing Software-Delivered AI to the AWS Marketplace (Part 3 of 3-Blog Series)

Build Scalable NLP and Computer Vision Pipelines With DeepSparse - Now Available From the AWS Marketplace (Part 2 of 3-Blog Series)

03/01/23
This is the second entry in our AWS-centric blog series leading up to the AWS Startup Showcase on Thursday, March 9th. We are excited to be a part of this event with other selected visionary AI startups to talk about the future of deploying AI into production at scale. Sign up here to register for this… Read More Build Scalable NLP and Computer Vision Pipelines With DeepSparse - Now Available From the AWS Marketplace (Part 2 of 3-Blog Series)

Neural Magic’s DeepSparse Inference Runtime Now Available in the AWS Marketplace (Part 1 of 3-Blog Series)

03/01/23
Neural Magic’s DeepSparse Inference Runtime can now be deployed directly from the AWS Marketplace. DeepSparse supports more than 60 different EC2 instance types and sizes, allowing you to quickly deploy the infrastructure that works best for your use case, based on cost and performance. In this blog post, we will illustrate how easy it is… Read More Neural Magic’s DeepSparse Inference Runtime Now Available in the AWS Marketplace (Part 1 of 3-Blog Series)

Neural Magic 1.4 Product Release

02/24/23
Here are highlights of the 1.4 product release of our DeepSparse, SparseML, and SparseZoo libraries. The full technical release notes are always available within our GitHub release indexes linked from the specific Neural Magic repository. If you have any questions, need assistance, or simply want to say hello to our vibrant ML performance community, join… Read More Neural Magic 1.4 Product Release

Process Text Faster Through Sequence Bucketing and DeepSparse

02/21/23
Simplify Pre-processing Pipelines with Sequence Bucketing to Decrease Memory Utilization and Inference Time For Efficient ML DeepSparse is an inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application. DeepSparse has built-in performance features, like sequence bucketing, to lower latency and increase the throughput of deep learning pipelines. These features… Read More Process Text Faster Through Sequence Bucketing and DeepSparse