TrainingarXiv cs.AI — 15 d ago

Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic Methods

The paper introduces PAVE (Policy-Aware Value-field Equalization), a novel critic-centric regularization framework designed to stabilize the Q-gradient field in actor-critic methods, addressing the issue of policy non-smoothness caused by the differential geometry of the critic. By minimizing Q-gradient volatility while preserving local curvature, PAVE enhances policy smoothness without altering the actor, achieving comparable performance to existing policy-side smoothness methods. This approach is significant for practitioners as it provides a theoretically grounded method to improve the stability and deployment readiness of continuous actor-critic policies in real-world applications.

actor_criticpolicy_smoothnessrelevance 0.00 · engagement 0.00

Read at source ↗← all news