AgentsarXiv cs.AI — 11 d ago

StepGuard: Guarding Web Navigation via Single-Step Calibration

The article introduces StepGuard, a framework designed to enhance web navigation by addressing single-step fragility through Dynamic Dual-Policy Optimization (DDPO) and Confidence-Guided Adaptive Navigation Reflection (CANR). DDPO employs a dual approach that alternates between navigation and answer modes to resolve reward conflicts, while CANR estimates confidence levels to trigger self-correction only when necessary, improving accuracy. The framework outperforms existing methods, achieving state-of-the-art results on web navigation benchmarks, which is crucial for practitioners aiming to build more reliable AI agents for web-based tasks.

web navigationreinforcement learningerror calibrationrelevance 0.00 · engagement 0.00

Read at source ↗← all news