SparseML

Enable sparsity with a few lines of code, through open-source libraries, to boost neural network inference performance.

Challenges

How Do You Run Models More Performantly and Efficiently?

Large models are inefficient.

Many of the top models across NLP and computer vision domains are difficult and expensive to use in a real-world deployment. While they are accurate, they are computationally intensive, which can require inflexible hardware accelerators in production.

Small models are less accurate.

Smaller models, while faster and more efficient, deliver less accuracy on real-world data.

How it Works

Optimize Your Models for Inference

SparseML enables you to create inference-optimized models using state-of-the-art pruning and quantization algorithms.

Models trained with SparseML in PyTorch can then be exported and deployed with nm-vllm and DeepSparse.

Product Overview

An AI Toolkit for Developers

SOTA Optimization Algorithms

Boost model performance with the same accuracy by introducing sparsity.

Pre-Optimized Models

Fine tune pre-sparsified versions of common models like BERT, YOLOv5, and ResNet-50 onto your datasets.

Training Pipeline Integrations

Use with your existing SOTA software like PyTorch, Ultralytics, and Hugging Face.

Standard Logging

Gain visibility around model experiment tracking through TensorBoard and Weights & Biases.

ONNX Export

Export your sparse models to ONNX format for deployment with DeepSparse.

Manageable Workflow

Get started with just a few lines of code.

Custom Modifiers

Modify any training graph and easily get to a functioning implementation.

Free & Open Source

Inspect, modify, and enhance software that’s maintained with SOTA algorithms.

Optimize

Your Model Optimization Toolkit

Accelerate Inference
Deploy models optimized by SparseML with GPU-class performance on DeepSparse.

Use Common Models
Convenient integrations for optimizing PyTorch, Ultralytics, and Hugging Face models.

Leverage the Latest
Adopt state-of-the-art (SOTA) compression algorithms to make inference efficient.