A Software Architecture for the Future of ML
Sparsify your deep learning models to minimize footprint & run on CPUs at GPU speeds.
Unprecedented Performance –– Run models on CPUs at GPU speeds. No special hardware required.
Reduce Costs –– Deploy and scale models on commodity CPU servers from the cloud to the edge.
Smaller Footprint –– Unlock edge possibilities by reducing model footprint by 20x.
Run Anywhere –– Deploy with flexibility on premise, in the cloud, or at the edge.
Open-source, easy-to-use interface to automatically sparsify and quantize deep learning models for CPUs & GPUs.
Open-source libraries and optimization algorithms for CPUs & GPUs, enabling integration with a few lines of code.
Open-source neural network model repository for highly sparse and sparse-quantized models with matching pruning recipes for CPUs and GPUs.
Free CPU runtime that runs sparse models at GPU speeds.
Paths to Sparse Acceleration
A.) Original Dense Path
Take your dense model & run it in the DeepSparse Engine, without any changes.
B.) SparseZoo Path
Take a pre-optimized model & run it in the DeepSparse Engine, or transfer learn with your data.
C.) Sparsified Path
Sparsify and quantize your dense model with ease & run it in the DeepSparse Engine.
Using Compound Sparsification for Faster and More Accurate BERT
Webinar: Hugging Face BERT from 3.3x to 14x Faster on CPUs