Blog - Neural Magic

Explore Our Latest Insights

Bringing the Neural Magic to GPUs

Announcing Community Support for GPU Inference Serving Over the past five years, Neural Magic has focused on accelerating inference of deep learning models on CPUs. To achieve this, we did two things: Many of the techniques we used to accelerate CPUs to make them more efficient can also help GPUs in their processing of LLMs.… Read More Blog

Announcing Community Support for GPU Inference Serving Over the past five years, Neural Magic has fo...

03.05.2024

Pushing the Boundaries of Mixed-Precision LLM Inference With Marlin

Key Takeaways In the rapidly evolving landscape of large language model (LLM) inference, the quest for speed and efficiency on modern GPUs has become a critical challenge. Enter Marlin, a groundbreaking Mixed Auto-Regressive Linear kernel that unlocks unprecedented performance for FP16xINT4 matrix multiplications. Developed by Elias Frantar at IST-DASLab and named after one of the… Read More Blog

Key Takeaways In the rapidly evolving landscape of large language model (LLM) inference, the quest f...

04.17.2024

YOLOv8 Detection 10x Faster With DeepSparse—Over 500 FPS on a CPU

Introducing YOLOv8—the latest object detection, segmentation, and classification architecture to hit the computer vision scene! Developed by Ultralytics, the authors behind the wildly popular YOLOv3 and YOLOv5 models, YOLOv8 takes object detection to the next level with its anchor-free design. But it's not just about cutting-edge accuracy. YOLOv8 is designed for real-world deployment, with a… Read More Blog

Introducing YOLOv8—the latest object detection, segmentation, and classification architecture to h...

01.18.2023