DeepSparse

A sparsity-aware inference server for CPUs. Maximize your CPU infrastructure with DeepSparse to run performant computer vision (CV), natural language processing (NLP), and large language models (LLMs).

Challenges

Can CPUs Be Enough to Support AI Inferencing?

GPUs are not needed for all deep learning use cases.

While GPUs are fast and powerful, they can be overkill in certain inferencing scenarios and limit organizations from leveraging existing CPU hardware for deep learning workloads.

CPUs alone don’t meet the bar for deep learning.

CPUs are flexible in deployment and more commonly available. But they are generally discounted in the world of deep learning, due to slow performance as models grow larger.

how it works

Accelerate Deep Learning on CPUs

DeepSparse achieves its performance using breakthrough algorithms that reduce the computation needed for neural network execution and accelerate the resulting memory-bound computation.

On commodity CPUs, the DeepSparse architecture is designed to emulate the way the human brain computes. It uses:

Sparsity to reduce the number of floating-point operations.
The CPU’s large fast caches to provide locality of reference, executing the network depth-wise and asynchronously.