Products
M*: A Modular, Extensible, Serving System for Multimodal Models
M* is introduced as a universal serving system designed for multimodal AI models, accommodating diverse components like vision encoders and audio codecs through a modular abstraction called the Walk Graph. It processes requests as dataflow graph traversals, enabling flexible model composition and optimization in distributed environments. Benchmarks show M* achieves 20% lower latency for text-to-image tasks on BAGEL compared to vLLM-Omni, and significantly improves throughput for text-to-speech tasks on Qwen3-Omni, marking a substantial advancement in serving complex AI models efficiently.
multimodalservingmodels