Join Us Every Other Week For
vLLM Office Hours
As a leading contributor to vLLM, Neural Magic partners with vLLM project committers and the vLLM team at UC Berkeley to host bi-weekly office hours. Join us to give feedback, ask questions, and hear about cutting-edge developments to accelerate your inference. Typical office hours agenda:
- 20-minute vLLM update
- 20-minute special guest topic; see below for details 👇
- 20-minute open discussion, feedback loop, and Q&A
vLLM Office Hours - vLLM Project Update and Open Discussion - January 09, 2025
In this session, we shared the latest updates in vLLM v0.6.6, including exciting new features such as Prefix Caching for Vision Language Models and support for macOS with Apple Silicon (M1 and newer). We also previewed the vLLM Roadmap for Q1 2025, highlighting upcoming advancements to accelerate LLM inference and enhance cross-platform compatibility.
During the open discussion, we tackled several community questions. These included inquiries about when bind_tools support for LangChain API will be available on the vLLM integration, whether DeepSeek FP8 quantization is truly blockwise (2D) or 1D groupwise, and plans for expert parallel optimizations within Mixture of Experts (MoE). Participants also asked how vLLM interacts with other frameworks like UnsLoTH, HuggingFace, and GG's llama.cpp, and whether there is a map of the landscape.
Session slides: https://docs.google.com/presentation/d/1Uic6jQZRUS9l7TuoNeaBrjeLwAGa98xs/
Join our bi-weekly vLLM Office Hours to learn about the latest features and updates: https://hubs.li/Q02Y5Pbh0 ...