Inference
Real-Time Execution with Autoregressive Policies
The article presents a method for achieving real-time execution with autoregressive policies in Vision-Language-Action models by modifying the tokenization horizon and employing constrained decoding. This approach ensures strict latency bounds and enables multi-trajectory decoding, resulting in superior task completion speeds compared to flow-matching policies. The findings highlight the competitive viability of autoregressive policies for real-time applications, emphasizing their faster convergence and better generalizability in instruction-following tasks, which is crucial for practitioners aiming to enhance performance in realistic deployments.
real-time-executionautoregressive-policiesinference