Bellman-Taylor Score Decoding for Markov Decision Processes with State-Dependent Feasible Action Sets
The paper introduces Bellman-Taylor score decoding, a novel framework for Markov Decision Processes (MDPs) that addresses state-dependent feasible action sets by transforming policy learning into a Euclidean score space while ensuring action feasibility through an action decoder. This approach allows the use of standard deep reinforcement learning (DRL) algorithms without the need for differentiating through the decoder, and it provides a performance guarantee that separates the optimality gap into structural and algorithmic errors. The framework demonstrates near-optimal performance in queueing network control problems, significantly outperforming benchmarks in larger instances, which is crucial for practitioners dealing with complex operational constraints in MDPs.