ModelsMarkTechPost — 7 d ago

Meet Qwen-RobotSuite: Three Embodied AI Models for VLA Manipulation, Video World Modeling, and Navigation

Qwen-RobotSuite introduces three embodied AI models: RobotManip, a Vision-Language-Action model based on Qwen3.5 with 4 billion parameters for manipulation tasks; RobotWorld, a language-conditioned video world model featuring a 60-layer MMDiT architecture; and RobotNav, a navigation model utilizing Qwen3-VL available in sizes of 2B, 4B, and 8B parameters. Each model is detailed with its architecture, data pipelines, and benchmark results, providing valuable insights for practitioners focused on advanced manipulation, video modeling, and navigation in AI applications.

embodiedaivlarelevance 0.00 · engagement 0.00

Read at source ↗← all news