Agents
Flow Control: Steering Vision-Language-Action Models with Simple Real-Time Inputs
The paper presents a flow control method for vision-language-action (VLA) models that allows real-time steering of actions using simple inputs, such as keyboard commands, without the need for retraining. This approach leverages the learned action distributions of VLAs to generate high-quality and high-fidelity actions, demonstrating improved task success rates and faster completion times, as well as enhanced performance when fine-tuning on flow control trajectories. This method is significant for practitioners as it provides an intuitive interface for guiding VLA actions, enhancing usability and effectiveness in real-world applications.
vision-languagereal-timecontrol