Training
Proximal Policy Optimization for Amortized Discrete Sampling
This paper introduces the application of Proximal Policy Optimization (PPO) to Generative Flow Networks (GFlowNets) for training stochastic policies to sample from structured discrete probability distributions. The authors demonstrate that PPO improves convergence speed and data efficiency over traditional GFlowNet training methods, validated through benchmarks including synthetic energies and molecular graph generation. This advancement is significant for practitioners as it enhances the performance of GFlowNets in various sampling tasks, potentially leading to more efficient model training and better results in applications involving discrete data.
policy-gradientgflownetreinforcement-learning