Agents
SENTINEL: Failure-Driven Reinforcement Learning for Training Tool-Using Language Model Agents
SENTINEL is a new failure-driven reinforcement learning framework designed to enhance the training of tool-using language model agents by converting rollout failures into targeted training tasks. It employs a Controller-Proposer-Solver architecture, where the Controller identifies recurring error patterns from failed trajectories, the Proposer generates tasks to address these weaknesses, and the Solver is trained on these tasks. When applied to the Tau2-Bench Retail with the Qwen3-4B-Thinking-2507 model, SENTINEL improved the Pass^1 metric from 66.4 to 74.9, demonstrating its effectiveness in optimizing training by leveraging model failures as a training signal, which is crucial for developing more reliable AI systems.
reinforcement-learningtool-usingfailure-driven