SparseML

Enable sparsity with a few lines of code, through open-source libraries, to boost neural network inference performance.

Challenges

How Do You Run Models More Efficiently?

Large models are inefficient.

Many of the top models across NLP and computer vision domains are difficult and expensive to use in a real-world deployment. While they are accurate, they are computationally intensive, which can require inflexible hardware accelerators in production.

Small models are less accurate.

Smaller models, while faster and more efficient, deliver less accuracy on real-world data.

How it Works

Optimize Your Models for Inference

SparseML enables you to create inference-optimized sparse models using state-of-the-art pruning and quantization algorithms.

Models trained with SparseML can then be exported to ONNX and deployed with DeepSparse for GPU-class performance on CPU hardware.

Product Overview

An AI Toolkit for Developers

SOTA Optimization Algorithms

Boost model performance with the same accuracy by introducing sparsity.

Pre-Optimized Models

Fine tune pre-sparsified versions of common models like BERT, YOLOv5, and ResNet-50 onto your datasets.

Training Pipeline Integrations

Use with your existing SOTA software like PyTorch, Ultralytics, and Hugging Face.

Standard Logging

Gain visibility around model experiment tracking through TensorBoard and Weights & Biases.

ONNX Export

Export your sparse models to ONNX format for deployment with DeepSparse.

Manageable Workflow

Get started with just a few lines of code.

Custom Modifiers

Modify any training graph and easily get to a functioning implementation.

Free & Open Source

Inspect, modify, and enhance software that’s maintained with SOTA algorithms.

Optimize

Your Model Optimization Toolkit

Accelerate Inference
Deploy models optimized by SparseML with GPU-class performance on DeepSparse.

Use Common Models
Convenient integrations for optimizing PyTorch, Ultralytics, and Hugging Face models.

Leverage the Latest
Adopt state-of-the-art (SOTA) compression algorithms to make inference efficient.