Agents
SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models
The article introduces SCALE, a novel inference strategy for Vision-Language-Action (VLA) models that enhances robustness by jointly modulating visual perception and action based on self-uncertainty, inspired by Active Inference theory. Unlike traditional test-time scaling methods that require additional training and multiple forward passes, SCALE operates efficiently with a single forward pass, improving performance on both simulated and real-world benchmarks. This approach allows practitioners to achieve adaptive execution in uncertain environments without the overhead of extra training or verification, thereby streamlining deployment in robotic control applications.
roboticsactionvision-language