vLLM Office Hours: Get the latest updates, connect with committers, and up-level your vLLM skills. Join us!
Products
nm-vllm
Enterprise inference server for LLMs on GPUs.
Neural Magic Compress
Developer subscription for enterprises aiming to build and deploy efficient GenAI models.
DeepSparse
Sparsity-aware inference server for LLMs, CV and NLP models on CPUs.
Community
vLLM Office Hours
Join our bi-weekly vLLM office hours to learn, ask, and give feedback.
GitHub
Look under the hood and contribute to our open-source code.
SparseZoo
Get started faster with our open-source model repository.
Hugging Face
Deliver fast inference with our pre-optimized, open-source LLMs.
Docs
Access the tutorials, guides, examples, and more.
Blog
Resources
Research Papers
Learn more about the magic behind Neural Magic.
Support
Get the answers you need.
Company
About Us
Who's Neural Magic?
Our Technology
How does it work?
Careers
Interested in joining our team?
Contact
Have a question for us?
Let's Connect
Jul 12, 2024
Spread the Word
Stay Up to Date
SUBMIT
Join the Conversation
slack
Continue Reading
Dec 18, 2024
2:4 Sparse Llama FP8: SOTA Performance for NVIDIA Hopper GPUs
Dec 17, 2024
Neural Magic Compress: Optimizing AI at Enterprise Scale
Product Release Notes
Nov 25, 2024
vLLM Release Roundup: What’s New in vLLM v0.6.4?
view all blogs