NeuralFlix
Unlock Faster and More Efficient Language Models with SparseGPT
Presenter: Dan Alistarh, Mark Kurtz
Discover SparseGPT, a novel machine learning model optimization approach that allows large language models (LLMs) models to be pruned and quantized in one shot, so they can be deployed on commodity CPUs at GPU speeds. This video provides an overview of SparseGPT, as well as deployment benchmarks for LLMs on CPU hardware.