Software-delivered AI Inference

Software-delivered AI Inference


Forget special hardware. Get GPU-class performance on CPUs with our sparsity-aware inference engine.

4-Core CPU (Lenovo Yoga 9 14ITL5) | DeepSparse 1.1.0 | 99% Accuracy | Replicate Now
A100/T4 NVIDIA GPU | TensorFlow 20.06-py3 NGC | 100% Accuracy | NVIDIA Numbers

icon-copy-blue

Try it Now


COPIED
PICK A USE CASE
Question Answering
Text Classification
Token Classification
Object Detection
Image Classification
1. Benchmark Your Use Case
2. Train With Your Data
3. Deploy To Your Infrastructure

Best CPU Performance, Guaranteed

 

Install DeepSparse, our sparsity-aware inference engine, and benchmark a sparse-quantized version of the Hugging Face BERT-base model.

Throughput speedup: 7x
Accuracy recovery: 99%

See SparseZoo for other sparse models and recipes you can benchmark and prototype from. Click here to see our Sparse Question Answering guide.

Apply Your Data With a Few Lines of Code

 

Install SparseML, our open-source library, to transfer learn our sparse-quantized model to your dataset using a few lines of code.

The example given in the terminal uses a public dataset. Click here to apply to your own data via transfer learning.

Deploy on CPUs at GPU Speeds

 

Use the DeepSparse Engine for best-in-class CPU performance.

Copy the Python code on the right for an example of the DeepSparse Python API for a question answering model.

Click here to learn about deployment options.

icon-copy-blue

Best CPU Performance, Guaranteed

 

Install DeepSparse, our sparsity-aware inference engine, and benchmark a sparse-quantized version of the Hugging Face BERT-base model.

Throughput speedup: 8.1x
Accuracy recovery: 99%

See SparseZoo for other sparse models and recipes you can benchmark and prototype from. Click here to see our Sparse Text Classification: Sentiment Analysis guide.

Apply Your Data With a Few Lines of Code

 

Install SparseML, our open-source library, to transfer learn our sparse-quantized model to your dataset using a few lines of code.

The example given in the terminal uses a public dataset. Click here to apply to your own data via transfer learning.

Deploy on CPUs at GPU Speeds

 

Use the DeepSparse Engine for best-in-class CPU performance.

Copy the Python code on the right for an example of the DeepSparse Python API for a question answering model.

Click here to learn about deployment options.

icon-copy-blue

Best CPU Performance, Guaranteed

 

Install DeepSparse, our sparsity-aware inference engine, and benchmark a sparse-quantized version of the Hugging Face BERT-base model.

Throughput speedup: 8.1x
Accuracy recovery: 99%

See SparseZoo for other sparse models and recipes you can benchmark and prototype from. Click here to see our Sparse Token Classification: Named Entity Recognition guide.

Apply Your Data With a Few Lines of Code

 

Install SparseML, our open-source library, to transfer learn our sparse-quantized model to your dataset using a few lines of code.

The example given in the terminal uses a public dataset. Click here to apply to your own data via transfer learning.

Deploy on CPUs at GPU Speeds

 

Use the DeepSparse Engine for best-in-class CPU performance.

Copy the Python code on the right for an example of the DeepSparse Python API for a question answering model.

Click here to learn about deployment options.

icon-copy-blue

Best CPU Performance, Guaranteed

 

Install DeepSparse, our sparsity-aware inference engine, and benchmark a sparse-quantized version of the YOLOv5s model to achieve a 12x speedup over PyTorch CPU.

See SparseZoo for other sparse models and recipes you can benchmark and prototype from. Click here to see our YOLOv3 and YOLOv5 benchmarking example in GitHub.

Apply Your Data With a Few Lines of Code

 

Install SparseML, our open-source library, to transfer learn our sparse-quantized model to your dataset using a few lines of code.

Click here to apply to your own data via transfer learning.

Deploy on CPUs at GPU Speeds

 

Use the DeepSparse Engine for best-in-class CPU performance.

Click here for our YOLOv3 and YOLOv5 DeepSparse Inference Examples in GitHub.

icon-copy-blue

Best CPU Performance, Guaranteed

 

Install DeepSparse, our sparsity-aware inference engine, and benchmark a sparse-quantized version of the ResNet-50 model to achieve a 7x speedup over ONNX Runtime CPU with 99% of the baseline accuracy.

See SparseZoo for other sparse models and recipes you can benchmark and prototype from. Click here to see our YOLOv3 and YOLOv5 benchmarking example in GitHub.

Apply Your Data With a Few Lines of Code

 

Install SparseML, our open-source library, to transfer learn our sparse-quantized model to your dataset using a few lines of code.

Click here to apply to your own data via transfer learning.

Deploy on CPUs at GPU Speeds

 

Use the DeepSparse Engine for best-in-class CPU performance.

Click here for our ResNet-50 DeepSparse Inference Examples in GitHub.

icon-copy-blue

Join the Deep Sparse Community


MONTHLY NEWSLETTER

PRODUCT & MODEL UPDATES

Intel Labs & Neural Magic

BERT-Large: Prune Once for DistilBERT Inference Performance