Bringing the
Neural Magic to GPUs
03/05/24
Announcing Community Support for GPU Inference Serving Over the past five years, Neural Magic has focused on accelerating inference of deep learning models on CPUs. To achieve this, we did two things: Many of the techniques we used to accelerate CPUs to make them more efficient can also help GPUs in their processing of LLMs.…
Read More Bringing the
Neural Magic to GPUs
Neural Magic 1.6 Product Release
12/21/23
For the last several months, we’ve been quite busy building out features across our libraries to enable large language model (LLM) inference on CPUs. We upgraded SparseML to support LLMs and generative models through transformers training, sparsification, and export pipelines. DeepSparse, Neural Magic’s inference runtime, has also been enhanced for performant LLM inference. Keep reading…
Read More Neural Magic 1.6 Product Release