Agents
Goal-Autopilot: A Verifiable Anti-Fabrication Firewall for Unattended Long-Horizon Agents
The article introduces Goal-Autopilot, a verifiable anti-fabrication firewall designed for long-horizon LLM agents, which ensures that agents cannot falsely claim task completion when unattended. The system employs a gated finite-state machine that externalizes state and enforces a No-False-Success theorem, allowing for constant per-step context cost while achieving a fabrication rate of only 0.95% on a 3,150-cell benchmark, significantly outperforming baselines like Reflexion and StateFlow. This approach is critical for practitioners as it enhances the reliability of autonomous agents in real-world applications by prioritizing honesty over mere capability, reducing the risk of erroneous outputs.
llmautonomyverification