|
Neural Magic has added support for large language models (LLMs) in DeepSparse, enabling inference speed-ups from compression techniques like SparseGPT on commodity CPUs. SparseGPT: Prune and Quantize LLMs Quickly With One-Shot State-of-the art language models are very large with parameter counts in the billions. To deploy one is expensive and often requires multiple GPUs just… Read More Speed up your LLMs with SparseGPT and DeepSparse on CPUs
|
This is the second entry in our Google Cloud blog series. We recently launched our DeepSparse Inference Runtime on the Google Cloud Marketplace, to make it easy for ML practitioners to deploy their models at the click of a few buttons. Latency, accuracy, and inference costs are all critical when deploying natural language processing (NLP)… Read More Build Scalable NLP and Computer Vision Pipelines With DeepSparse - Now Available From the Google Cloud Marketplace
|
Neural Magic's DeepSparse Inference Runtime can now be deployed directly from the Google Cloud Marketplace. DeepSparse supports various machine types on Google Cloud, so you can quickly deploy the infrastructure that works best for your use case, based on cost and performance. In this blog post, we will illustrate how easy it is to get… Read More Neural Magic’s DeepSparse Inference Runtime Now Available in the Google Cloud Marketplace
|
This blog, originally posted in December 2022, has been edited in May 2023 to reflect updates made to the "Batch Deployment Flow" section and GitHub repo links. Leveraging the advantages of serverless computing, developers can deploy and manage AI-driven applications with unprecedented efficiency, scalability, and cost-effectiveness. With serverless deployments, machine learning inference can execute two… Read More Deploy Serverless Machine Learning Inference on AWS with DeepSparse
|
As organizations continue to explore and invest in AI to advance their productivity and bottom lines, it's become clear that deploying models can put quite a dent in IT budgets. Organizations want the option to use cost-efficient commodity CPUs to support AI development. One option for companies to consider is DigitalOcean's Premium CPU-Optimized Droplets, launched… Read More How to Achieve Up to 3X AI Speedup on DigitalOcean's Premium CPUs
|
With conventional object detection models, it can be challenging to identify small objects due to the limited number of pixels they occupy in the overall image. To help with this issue, you can use a technique like, Slicing Aided Hyper Inference (SAHI), which works on top of object detection models to discover small objects without… Read More Detecting Small Objects on High-Resolution Images With SAHI and DeepSparse
|
Six months ago, Neural Magic shared remarkable MLPerf results, with a 175X increase in CPU performance, attained using sparsity. This breakthrough was achieved exclusively with software, using sparsity-aware inferencing techniques. The impressive outcomes showcased the potential of network sparsity to enhance the performance of machine learning models on readily available CPUs. This advancement empowers individuals… Read More Neural Magic Scales up MLPerf™ Inference v3.0 Performance With Demonstrated Power Efficiency; No GPUs Needed
|
Pre-trained computer vision (CV) and natural language processing (NLP) models yield high accuracy in real-world applications but have low latency and throughput due to their large size. The models are also difficult and expensive to deploy. The problem is solved by reducing the models' size through pruning and reducing the precision of the weights through… Read More Deploy Optimized Hugging Face Models With DeepSparse and SparseZoo
|
Training time is a well-known problem when training computer vision networks such as image classification models. The problem is aggravated by the fact that image data and models are large, therefore requiring a lot of computational resources. Traditionally, these problems have been solved using powerful GPUs to load the data faster.  Unfortunately, these GPUs are… Read More Sparsify Image Classification Models Faster with SparseML and Deep Lake
|
This is the final entry in our AWS-centric blog series leading up to the AWS Startup Showcase on Thursday, March 9th. We are excited to be a part of this event with other selected visionary AI startups to talk about the future of deploying AI into production at scale. Sign up here to register for… Read More Bringing Software-Delivered AI to the AWS Marketplace (Part 3 of 3-Blog Series)