DeepSparse
A sparsity-aware inference runtime that delivers GPU-class performance on commodity CPUs, purely in software, anywhere.
GPUs Are Not Optimal
Machine learning inference has evolved over the years led by GPU advancements. GPUs are fast and powerful, but they can be expensive, have short life spans, and require a lot of electricity.CPUs Alone Don't Meet the Bar
CPUs are flexible in deployment and more commonly available. But they are generally discounted in the world of ML, due to slow performance as models grow larger.What if you could have the best of both?
Meet DeepSparse
Accelerate Deep Learning on CPUs
DeepSparse achieves its performance using breakthrough algorithms that reduce the computation needed for neural network execution and accelerate the resulting memory-bound computation.
On commodity CPUs, the DeepSparse architecture is designed to emulate the way the human brain computes:
- It uses sparsity to reduce the number of floating-point operations.
- It uses the CPU’s large fast caches to provide locality of reference, executing the network depth-wise and asynchronously.