LLMs on CPUs.
From research to code, use model sparsity to accelerate open-source LLMs and bring operational simplicity to GenAI deployments.
Accelerated CPU Inference
Leverage Neural Magic's cutting-edge model sparsity techniques to run your open-source LLM on commodity hardware.
State-of-the-Art Model Optimization Research
In collaboration with the Institute of Science and Technology Austria, Neural Magic develops innovative LLM compression research and shares impactful findings with the open source community, including the state-of-the-art Sparse Fine-Tuning technique.