Agents
Phase-Aware Guidance Injection for Recurrent MAPPO in Assembly-Line Disruption Recovery
The article presents a phase-aware guidance injection framework for enhancing recurrent MAPPO (RMAPPO) in the context of disruption recovery in assembly lines. This framework integrates logit-level action bias to leverage heterogeneous external recovery knowledge during decision-making, improving abnormal recovery time (ART) and on-time delivery (OTD). Experimental results indicate that rule-based guidance significantly outperforms other methods, while replay-based and online LLM guidance offer diminishing but valuable support, highlighting the potential for adaptive decision-making without altering the underlying actor architecture.
reinforcement learningpolicyscheduling