Join Us Every Other Week For
vLLM Office Hours
As a leading contributor to vLLM, Neural Magic partners with vLLM project committers and the vLLM team at UC Berkeley to host bi-weekly office hours. Join us to give feedback, ask questions, and hear about cutting-edge developments to accelerate your inference. Typical office hours agenda:
- 20-minute vLLM update
- 20-minute special guest topic; see below for details 👇
- 20-minute open discussion, feedback loop, and Q&A
vLLM Office Hours #21 - vLLM Production Stack Deep Dive - March 6, 2025
Join us for an overview of the components in the vLLM Production Stack (https://github.com/vllm-project/production-stack) and practical guidance on deploying it effectively. We’ll dive into the technical details, including an in-depth look at the prefix-aware router and its role in optimizing request routing, as well as KV cache offloading and its impact on performance and scalability.
Session slides: https://docs.google.com/presentation/d/1sE4IVpgPv4gGMJqv6iXJYOyd0Qm4PH__/
Join our bi-weekly vLLM Office Hours to learn about the latest features and updates: https://hubs.li/Q02Y5Pbh0 ...