vLLM Office Hours: Get the latest updates, connect with committers, and up-level your vLLM skills. Join us!
Products
nm-vllm
Enterprise inference server for LLMs on GPUs.
DeepSparse
Sparsity-aware inference server for LLMs, CV and NLP models on CPUs.
Community
vLLM Office Hours
Join our bi-weekly vLLM office hours to learn, ask, and give feedback.
GitHub
Look under the hood and contribute to our open-source code.
SparseZoo
Get started faster with our open-source model repository.
Hugging Face
Deliver fast inference with our pre-optimized, open-source LLMs.
Docs
Access the tutorials, guides, examples, and more.
Blog
Resources
Research Papers
Learn more about the magic behind Neural Magic.
Support
Get the answers you need.
Company
About Us
Who's Neural Magic?
Our Technology
How does it work?
Careers
Interested in joining our team?
Contact
Have a question for us?
Let's Connect
Neural Magic accelerates open-source LLM, CV, and NLP models and brings operational simplicity to your AI deployments. Schedule time to experience our model optimization and inference acceleration software in action with one of our team's experts.