Welcome to Software-Delivered AI

Welcome to Software-Delivered AI

Unlock the full potential of your ML environment. Accommodate the continuous growth of neural networks without added complexity or cost.

neural magic video overview
Optimize Your Infrastructure

Optimize Your Infrastructure

Simplify ML deployments so you can use compute-heavy models in a cost-efficient and scalable way on existing CPU infrastructure.

Best-in-Class Performance on CPUs

*BERT-Base | Input= [64, 128] | single-stream | c6i.12xlarge | 99% of FP32 Accuracy

*YOLOv5s | Input= [64, 3, 640, 640] | single-stream | c6i.12xlarge | 99% of FP32 Accuracy




Deploy state-of-the-art models trained on your data with GPU-class performance on commodity CPUs.

Flexible Deployment

Run consistently across cloud, data center, and edge with any hardware provider from Intel to AMD to ARM.
infinite scalabily

Infinite Scalability

Bring horizontal and vertical scale to your ML solutions with physical, virtual, containerized, and serverless deployment options.
ease if integration

Ease of Integration

Use clean APIs for integrating models into applications and monitoring them in production.

“Our close collaboration with Neural Magic has driven outstanding optimizations for 4th Gen AMD EPYC™ processors. Their DeepSparse Platform takes advantage of our new AVX-512 and VNNI ISA extensions, enabling outstanding levels of AI inference performance for the world of AI-powered applications and services.”

- Kumaran Siva, Corporate VP, Software & Systems Business Development, AMD

“The DeepSparse program showed dramatically higher numbers of queries processed per second than many of the standard systems...Neural Magic's work has broad implications for AI and for the chip community.”

“We used the Neural Magic Inference Engine with our sparse models and the results were nothing short of impressive. By using our sparsity method, we were able to achieve almost twice the inference speed with 80% sparsity while still passing the bar of the tinyMLPerf challenge.”

“[Neural Magic is] literally crushing it when it comes to delivering on their mission, to make deep learning more accessible to everybody.”

- Francesco Pochetti, Data Scientist, AWS Machine Learning Hero

Our Products


Sparsity-aware inference runtime for GPU-class performance on CPUs.
Get Started
sparse ml logo


Open-source libraries for applying sparsification recipes to neural networks.
Get Started
sparse zoo logo


Open-source model repository for sparse and sparse-quantized models.
Get Started
sparsify logo


ML model optimizer to accelerate inferencing at scale.
Coming Soon

Blog and News

Join the Neural Magic Community