LTX-2 Now Production-Ready on SGLang x GMI Cloud

GMI and SGLang have jointly integrated LTX-2, a 19B-parameter open-source model for text-to-video, image-to-video, and joint video-plus-audio generation, into the SGLang inference framework, production-validated and available now.

Overview

GMI and SGLang have jointly integrated LTX-2, a 19B-parameter open-source model for text-to-video, image-to-video, and joint video-plus-audio generation, into the SGLang inference framework, production-validated and available now.

Teams Can Now Use LTX-2!

SGLang has been the go-to high-performance serving framework for LLM inference. It's fast, it's well-maintained, and a lot of teams have already built their inference stack around it. 

Until now, if you also needed video generation, you were running a second stack alongside it. That's no longer the case.

LTX-2 in SGLang means text and video inference through a single API and a single operational model to maintain. For teams building multimodal products, or just trying to avoid the overhead of managing separate pipelines for every new modality, this is a real consolidation.

The other thing worth noting: LTX-2 is one of the strongest open-source video generation models available. Accessible through the same SGLang API you'd use for any LLM workload, the barrier to experimenting with and eventually shipping it drops considerably.

The Technicals

Getting a 19B-parameter video model into a production serving framework is not a straightforward port. The research environment and the production environment have almost nothing in common in terms of what they demand from the model.

GMI's engineering team worked directly with SGLang to close that gap. That meant validating multi-GPU parallelism on 8-GPU clusters, implementing CPU offloading for memory-efficient deployment, and tuning throughput for the kind of sustained API load that real workloads generate. 

It also meant running the SGLang implementation against the reference outputs at the pixel level because quality drift under a serving framework is a real failure mode that's easy to miss and hard to debug later.

The integration passed. The SGLang implementation of LTX-2 matches the reference outputs at the pixel level. What you get through the API is what the model is supposed to produce.

Try it now

LTX-2 is available today through SGLang on GMI's GPU infrastructure.

If you want to run it yourself, get started on GMI Cloud with a dedicated endpoint or try our closed-source versions. If you're evaluating it for a specific workload — video API, multimodal product, high-volume generation pipeline — talk to the GMI engineering team about deployment architecture and what the infrastructure actually looks like at scale.

Shout-out and Special Thanks

We’d like to give a special shout-out to FlamingoPg from Sglang Community for his outstanding contributions to the SGLang LTX-2 support. He contributed thousands of lines of code, including major work on the LTX-2 VAE and the integration of various audio-video processing components. His efforts significantly strengthened the stability and completeness of the LTX-2 pipeline within SGLang. We truly appreciate his dedication and impact to the community.

Colin Mo
Head of Content
Build AI Without Limits
GMI Cloud helps you architect, deploy, optimize, and scale your AI strategies
Get Started Now

Ready to build?

Explore powerful AI models and launch your project in just a few clicks.
Get Started