Agents
StepGuard: Guarding Web Navigation via Single-Step Calibration
The article introduces StepGuard, a framework designed to enhance web navigation by addressing single-step fragility through Dynamic Dual-Policy Optimization (DDPO) and Confidence-Guided Adaptive Navigation Reflection (CANR). DDPO employs a dual approach that alternates between navigation and answer modes to resolve reward conflicts, while CANR estimates confidence levels to trigger self-correction only when necessary, improving accuracy. The framework outperforms existing methods, achieving state-of-the-art results on web navigation benchmarks, which is crucial for practitioners aiming to build more reliable AI agents for web-based tasks.
web navigationreinforcement learningerror calibration