DeepSparse
A sparsity-aware inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application.
pip install deepsparse
deepsparse.benchmark "zoo:nlp/question_answering/distilbert-none/pytorch/huggingface/squad/pruned80_quant-none-vnni"
GPUs Are Not Optimal
Machine learning inference has evolved over the years led by GPU advancements. GPUs are fast and powerful, but can be expensive, have short life spans, and require a lot of electricity.CPUs Are Set for Failure
CPUs are flexible in deployment and more commonly available, but have generally been discounted in the world of ML. The way current models are developed doesn’t suit CPU’s architecture.What if you could have the best of both worlds?
MEET DEEPSPARSE
Machine Learning Execution Reimagined
DeepSparse achieves its performance using breakthrough algorithms that reduce the computation needed for neural network execution and accelerate the resulting memory-bound computation.
DeepSparse architecture is designed to mimic, on commodity CPUs, the way brains compute:
- It uses sparsity to reduce the need for flops
- It uses the CPU’s large fast caches to provide locality of reference, executing the network depthwise and asynchronously.