A sparsity-aware neural network inference engine that delivers GPU-class performance on commodity CPUs, anywhere.
pip install deepsparse
GPUs Are Not OptimalMachine learning inference has evolved over the years led by GPU advancements. GPUs are fast and powerful, but can be expensive, have short life spans, and require a lot of electricity.
CPUs Are Set for FailureCPUs are flexible in deployment and more commonly available, but have generally been discounted in the world of ML. The way current models are developed doesn’t suit CPU’s architecture.
What if you could have the best of both worlds?
MEET THE DEEPSPARSE ENGINE
Machine Learning Execution Reimagined
The DeepSparse Engine achieves its performance using breakthrough algorithms that reduce the computation needed for neural network execution and accelerate the resulting memory-bound computation.
DeepSparse architecture is designed to mimic, on commodity CPUs, the way brains compute:
- It uses sparsity to reduce the need for flops
- It uses the CPU’s large fast caches to provide locality of reference, executing the network depthwise and asynchronously.