Research
Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training
The article introduces Curiosity-Critic, a novel approach to intrinsic reward in world model training that enhances exploration by focusing on cumulative prediction error rather than just local transitions. It employs a learned critic to estimate the asymptotic error baseline, allowing the model to differentiate between learnable and stochastic transitions, ultimately improving training speed and final model accuracy in experiments on a stochastic grid world. This method outperforms traditional curiosity-based approaches, providing a more effective framework for practitioners focused on optimizing exploration strategies in reinforcement learning.
curiosityreinforcement-learningworld-model