AgentsarXiv cs.AI — 21 h ago

SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration

The paper introduces Sharpness-Aware Policy Optimization (SHAPO), a novel reinforcement learning method designed to enhance safe exploration by leveraging epistemic uncertainty. SHAPO employs a sharpness-aware policy update that evaluates gradients at perturbed parameters, which biases learning towards conservative behavior in under-explored regions and amplifies the influence of rare unsafe actions. Experimental results demonstrate that SHAPO improves both safety and task performance across continuous-control tasks, outperforming existing baselines and expanding their Pareto frontiers, which is critical for deploying RL agents in safety-sensitive applications.

reinforcement learningsafe explorationrelevance 0.00 · engagement 0.00

Read at source ↗← all news