Join Us Every Other Week For
vLLM Office Hours
As a leading contributor to vLLM, Neural Magic partners with vLLM project committers and the vLLM team at UC Berkeley to host bi-weekly office hours. Join us to give feedback, ask questions, and hear about cutting-edge developments to accelerate your inference. Typical office hours agenda:
- 20-minute vLLM update
- 20-minute special guest topic; see below for details 👇
- 20-minute open discussion, feedback loop, and Q&A
vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024
In this vLLM office hours session, we explore the latest updates in vLLM v0.6.2, including Llama 3.2 Vision support, the introduction of MQLLMEngine for API Server, and beam search externalization. Following these updates, Lily Liu, vLLM Committer and PhD student at UC Berkeley, joins us to discuss speculative decoding in vLLM. She provides insights into what speculative decoding is, its different types, performance benefits in vLLM, research ideas surrounding it, and how to apply it effectively within vLLM.
Session slides: https://docs.google.com/presentation/d/1wUoLmhfX6B7CfXy3o4m-MdodRL26WvY3/
Join our bi-weekly vLLM office hours: https://neuralmagic.com/community-office-hours/ ...