Multimodal
Zyphra Release Zamba2-VL: Hybrid Mamba2–Transformer Vision-Language Models That Cut Time-to-First-Token by About an Order of Magnitude
Zyphra has introduced Zamba2-VL, a series of open vision-language models available in sizes of 1.2B, 2.7B, and 7B parameters, utilizing a hybrid architecture that combines Mamba2 state-space and Transformer components. These models significantly reduce time-to-first-token by approximately an order of magnitude while maintaining competitiveness with existing Transformer-based vision-language models. This advancement is crucial for practitioners seeking to optimize response times in applications involving vision-language tasks.
vision-languagemodelszyphra