Latest MLPerf™ Inference v3.1 Results Show 50X Faster AI Inference for x86 and ARM from Neural Magic
Neural Magic continues to push the limit with AI performance, as evidenced by our MLPerf™ Inference v2.1 and v3.0 outcomes. What started out as a solution for research resource limitations, founders Nir Shavit and Alexander Matveev now provide a solution for customers that need a more cost efficient, yet still performant option, other than GPUs,… Read More Latest MLPerf™ Inference v3.1 Results Show 50X Faster AI Inference for x86 and ARM from Neural Magic
HPE ProLiant with AMD EPYC™ CPUs provide exceptional value to AI workloads, especially when combined with Neural Magic, to unlock incredible levels of performance for AI inference. Software-Accelerated AI Ever-larger machine learning models (ML) place ever-larger demands on hardware. Neural Magic helps alleviate hardware demands with a software-accelerated AI inference solution that delivers impressive ML… Read More Unleash Software-Accelerated AI Inference with Neural Magic on HPE ProLiant Gen11 Servers Powered by 4th Gen AMD EPYC Processors
Amazon Web Services (AWS) Elastic Kubernetes Service (EKS) and Neural Magic's DeepSparse work together seamlessly to provide customers with impactful deep learning inference for production environments. This paired solution offers an accessible and efficient alternative to default hardware choices often used in large-scale machine learning deployments. Advantages of integrating of these two technologies include improved… Read More Scaling CPU Inference on AWS EKS with DeepSparse
Here are highlights of the 1.5 product release of DeepSparse, SparseML, and SparseZoo libraries. The full technical release notes are always available within our GitHub release indexes linked from the specific Neural Magic repository. Join us in the Neural Magic Community Slack if you have any questions, need assistance, or simply want to introduce yourself. For… Read More Neural Magic 1.5 Product Release
In an era where the hunger for data is driving an exponential surge in computational demand, organizations are realizing the need for power-efficient commodity compute to support artificial intelligence (AI) workloads. This is the basis of our work at Neural Magic, as we help customers optimize IT infrastructure to support AI projects and to drive… Read More Advancing AI Inference Density with Neural Magic and AMD
Neural Magic has added support for large language models (LLMs) in DeepSparse, enabling inference speed-ups from compression techniques like SparseGPT on commodity CPUs. SparseGPT: Prune and Quantize LLMs Quickly With One-Shot State-of-the art language models are very large with parameter counts in the billions. To deploy one is expensive and often requires multiple GPUs just… Read More Speed up your LLMs with SparseGPT and DeepSparse on CPUs
This is the second entry in our Google Cloud blog series. We recently launched our DeepSparse Inference Runtime on the Google Cloud Marketplace, to make it easy for ML practitioners to deploy their models at the click of a few buttons. Latency, accuracy, and inference costs are all critical when deploying natural language processing (NLP)… Read More Build Scalable NLP and Computer Vision Pipelines With DeepSparse - Now Available From the Google Cloud Marketplace
Neural Magic's DeepSparse Inference Runtime can now be deployed directly from the Google Cloud Marketplace. DeepSparse supports various machine types on Google Cloud, so you can quickly deploy the infrastructure that works best for your use case, based on cost and performance. In this blog post, we will illustrate how easy it is to get… Read More Neural Magic’s DeepSparse Inference Runtime Now Available in the Google Cloud Marketplace
This blog, originally posted in December 2022, has been edited in May 2023 to reflect updates made to the "Batch Deployment Flow" section and GitHub repo links. Leveraging the advantages of serverless computing, developers can deploy and manage AI-driven applications with unprecedented efficiency, scalability, and cost-effectiveness. With serverless deployments, machine learning inference can execute two… Read More Deploy Serverless Machine Learning Inference on AWS with DeepSparse
As organizations continue to explore and invest in AI to advance their productivity and bottom lines, it's become clear that deploying models can put quite a dent in IT budgets. Organizations want the option to use cost-efficient commodity CPUs to support AI development. One option for companies to consider is DigitalOcean's Premium CPU-Optimized Droplets, launched… Read More How to Achieve Up to 3X AI Speedup on DigitalOcean's Premium CPUs