Research
Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes
The paper introduces QR-MAX, a novel model-based reinforcement learning algorithm specifically designed for discrete non-Markovian reward decision processes (NMRDPs), addressing the lack of formal guarantees on optimality and sample efficiency. QR-MAX utilizes a factorization approach that separates Markovian transition learning from non-Markovian reward handling, achieving PAC convergence to ε-optimal policies with polynomial sample complexity. Additionally, the extension to continuous state spaces, called Bucket-QR-MAX, employs a SimHash-based discretizer, enhancing sample efficiency and robustness in complex environments, which is critical for practitioners dealing with temporal-dependency tasks in RL.
reinforcement learningnmrpqr-max