Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference (ICML 2020)

In July 2020, at the International Conference on Machine Learning, we presented a paper on methods for maximizing the sparsity of the activations in a trained neural network.

We showed that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains.

And if you want to learn more about pruning, start by checking out the first of our five-part blog series: What is Pruning in Machine Learning? (Make sure you also read part two, where we state what the best pruning approach is!)