According to a recent poll from Ultralytics, the creators of YOLO object detection models, 22% of ML experts experience difficulty deploying their vision AI models. Getting into production successfully is hard, and scaling while in production is even harder.
To improve this step in the ML pipeline, Ultralytics partnered with Neural Magic, whose DeepSparse runtime takes advantage of sparsity and low-precision arithmetic within neural networks to offer exceptional performance on commodity hardware. Neural Magic has sparsified different versions of the YOLO models for everyone to use, which you can find in our SparseZoo. As a reminder, sparse models are both pruned and quantized, so they lead to easier deployments and significant performance improvements at minimum accuracy expense, especially when deployed with DeepSparse on commodity x86 CPUs.
View the video below, recorded on February 8, 2023, to see how you can use Neural Magic sparsification tools and our DeepSparse Runtime to achieve GPU-class performance for YOLOv5 (and other YOLO models) on commodity CPUs.