AgentsarXiv cs.AI — 7 d ago

SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models

The article introduces SCALE, a novel inference strategy for Vision-Language-Action (VLA) models that enhances robustness by jointly modulating visual perception and action based on self-uncertainty, inspired by Active Inference theory. Unlike traditional test-time scaling methods that require additional training and multiple forward passes, SCALE operates efficiently with a single forward pass, improving performance on both simulated and real-world benchmarks. This approach allows practitioners to achieve adaptive execution in uncertain environments without the overhead of extra training or verification, thereby streamlining deployment in robotic control applications.

roboticsactionvision-languagerelevance 0.00 · engagement 0.00

Read at source ↗← all news