Deploy OS LLMs with vLLM and Neural Magic

Video

Deploy LLMs More Efficiently with vLLM and Neural Magic

Learn why vLLM is the leading open-source inference server and how Neural Magic works with enterprises to build and scale vLLM-based model services with more efficiency and cost savings.

The ecosystem of open-source LLMs has exploded over the past year. A new model tops the leaderboard almost every week. Enterprises can now deploy state-of-the-art, open-source LLMs like Llama 3 securely on their infrastructure of choice, fine-tuned with their data for domain-specific use cases, at a significantly lower cost than proprietary APIs.

vLLM has emerged as the most popular inference server to deploy open-source LLMs with leading performance, ease of use, broad model support, and heterogeneous hardware backends.

Neural Magic is a leading contributor to the vLLM project and offers nm-vllm, an enterprise-ready vLLM distribution. nm-vllm includes:

Stable builds of vLLM with long-term support, model to silicon
Tools and expertise for optimizing LLMs for inference with techniques like quantization and sparsity
Reference architectures for scalable deployments with Kubernetes
Integration of telemetry and key monitoring systems

Watch our webinar recording from July 11, 2024, to learn: