Training
Trust-Region Diffusion Policies for Massively Parallel On-Policy RL
The paper introduces Trust-region Diffusion Policies (TruDi), a novel approach for training diffusion models in massively parallel on-policy reinforcement learning. TruDi incorporates a trust-region optimization rule to maintain a KL-divergence constraint, addressing the challenges of rapidly changing data distributions during training. Empirical evaluations demonstrate that TruDi outperforms or matches existing strong baselines across 73 tasks in four RL benchmarks, particularly excelling in complex humanoid control scenarios, thereby setting a new standard for on-policy RL methods.
reinforcement-learningpolicy