Sparsify Open Source LLMs for Accelerated Inference

Organizations are starting to feel the friction as they deploy LLMs to production. The intense compute requirements are demanding on inference serving infrastructure, and as a result, are expensive to operate and support.

So, how do you deploy your LLMs without breaking the bank? Sparsify to remove redundancy, reduce memory footprint, and increase performance while keeping your LLMs intact. You can also start with a sparsified foundational model to get to production quicker.

Join Mark Kurtz, Neural Magic's CTO, as he shares industry-leading model optimization and inference acceleration techniques you can apply to your LLM applications today.

Date: Tuesday, June 4, 2024
Time: 1:00 PM EST
Location: Virtual. Register to receive a join link.

Save Your Spot