Join Us Every Other Week For
vLLM Office Hours
As a leading contributor to vLLM, Neural Magic partners with vLLM project committers and the vLLM team at UC Berkeley to host bi-weekly office hours. Join us to give feedback, ask questions, and hear about cutting-edge developments to accelerate your inference. Typical office hours agenda:
- 20-minute vLLM update
- 20-minute special guest topic; see below for details 👇
- 20-minute open discussion, feedback loop, and Q&A
vLLM Office Hours - Deep Dive into Mistral on vLLM - October 17, 2024
In this session of our bi-weekly vLLM office hours, we explored the exciting updates in the vLLM v0.6.3 release, featuring experimental fullgraph torch.compile, the introduction of a Feature Compatibility Matrix, and the Machete w4a16 kernel for Hopper GPUs. We also covered new VLM support for GLM-4V, Molmo, NVLM-D, tool-use support for Llama 3.1+3.2 and InternLM2.5, and Reward LM support for Qwen2.5-Math-RM-72B.
During our special topic deep dives, we were joined by Mistral AI’s research engineer, Patrick von Platen, who shared insights into Mistral’s architecture choices and how to efficiently deploy Mistral's models on vLLM.
During the Q&A, we tackled audience questions on topics such as architecture redesign strategies, rotary position embeddings, vLLM support for ARM architecture, OpenAI Whisper, Seq2Seq support in v0.6.3, and more.
Session slides: https://docs.google.com/presentation/d/1fF4ZlnAFXDeKHBGzkJsCeXLkarvlbNRx
Explore and join our bi-weekly vLLM office hours every other Thursday: https://neuralmagic.com/community-office-hours/ ...