Agents
SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
The paper introduces Sharpness-Aware Policy Optimization (SHAPO), a novel reinforcement learning method designed to enhance safe exploration by leveraging epistemic uncertainty. SHAPO employs a sharpness-aware policy update that evaluates gradients at perturbed parameters, which biases learning towards conservative behavior in under-explored regions and amplifies the influence of rare unsafe actions. Experimental results demonstrate that SHAPO improves both safety and task performance across continuous-control tasks, outperforming existing baselines and expanding their Pareto frontiers, which is critical for deploying RL agents in safety-sensitive applications.
reinforcement learningsafe exploration