Join Us Every Other Week For
vLLM Office Hours
As a leading contributor to vLLM, Neural Magic partners with vLLM project committers and the vLLM team at UC Berkeley to host bi-weekly office hours. Join us to give feedback, ask questions, and hear about cutting-edge developments to accelerate your inference. Typical office hours agenda:
- 20-minute vLLM update
- 20-minute special guest topic; see below for details 👇
- 20-minute open discussion, feedback loop, and Q&A
vLLM Office Hours #23 - Deep Dive Into the LLM Compressor - April 10, 2025
LLM Compressor is an easy-to-use library for optimizing models for deployment with vllm. We'll show you the power of the LLM Compressor via examples and easy pathways to get started with optimizing LLMs for fast and efficient vLLM inference. We'll also share the latest Llama 4 developments in vLLM! We also shared an update on Llama 4 Day 0 support in vLLM. Enjoy!
Session slides: https://docs.google.com/presentation/d/1S2NnFqkbX4jLe84sY4ITSP4lOKGph5eN/
Join our bi-weekly vLLM Office Hours to learn about the latest features and updates: https://hubs.li/Q02Y5Pbh0 ...