AgentsarXiv cs.AI — 10 d ago

Safe Exploration via Policy Priors

The article introduces SOOPER, a novel approach for safe exploration in reinforcement learning that leverages conservative policy priors derived from offline data or simulators. It employs probabilistic dynamics models to balance optimistic exploration with safe fallback strategies, ensuring safety throughout the learning process while guaranteeing convergence to an optimal policy. Experimental results show that SOOPER not only scales effectively but also surpasses existing state-of-the-art methods on key safe RL benchmarks, validating its theoretical safety guarantees in practical scenarios.

reinforcement learningpolicysafetyrelevance 0.00 · engagement 0.00

Read at source ↗← all news