Feb 09, 2023
Author(s)
According to a recent poll from Ultralytics, the creators of YOLO object detection models, 22% of ML experts experience difficulty deploying their vision AI models. Getting into production successfully is hard, and scaling while in production is even harder.
To improve this step in the ML pipeline, Ultralytics partnered with Neural Magic, whose DeepSparse runtime takes advantage of sparsity and low-precision arithmetic within neural networks to offer exceptional performance on commodity hardware. Neural Magic has sparsified different versions of the YOLO models for everyone to use, which you can find in our SparseZoo. As a reminder, sparse models are both pruned and quantized, so they lead to easier deployments and significant performance improvements at minimum accuracy expense, especially when deployed with DeepSparse on commodity x86 CPUs.
View the video below, recorded on February 8, 2023, to see how you can use Neural Magic sparsification tools and our DeepSparse Runtime to achieve GPU-class performance for YOLOv5 (and other YOLO models) on commodity CPUs.
To see our YOLO benchmarks and to learn more about how we optimized YOLO models to run at best-in-class speeds on commodity CPUs, read our YOLOv5 and YOLOv8 blogs.
If you have any questions or need assistance with deploying and scaling your AI efforts, contact us directly. To use DeepSparse in your environment and on your own time, start a 90-day free trial.