Training
ReCal: Reward Calibration for RL-based LLM Routing
The article introduces ReCal, a Reward Calibration framework designed to enhance reinforcement learning (RL)-based routing for large language models (LLMs). It features a hierarchical reward decomposition mechanism and a distribution-aware optimization strategy, addressing issues of ambiguous credit assignment and optimization bias in heterogeneous tasks. Experimental results across seven datasets indicate that ReCal improves routing performance and training stability, which is crucial for practitioners aiming to optimize model selection and performance in diverse applications.
llmroutingreinforcement-learningreward-calibration