Models
SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data
SmolVLA is a new vision-language-action model developed using data from the Lerobot community, designed to process and generate multimodal outputs efficiently. It features a compact architecture that significantly reduces parameter count while maintaining performance on benchmarks relevant to vision-language tasks. This model's efficiency and adaptability make it a valuable tool for practitioners looking to implement lightweight AI solutions in robotics and interactive systems.
vision-languagesmolvla