Training
Learning User Simulators with Turing Rewards
The article introduces Turing-RL, a novel reinforcement learning approach for training user simulators that utilizes a Turing-Test-based reward mechanism. This method employs a discriminative Turing reward, where an LLM judge assesses the indistinguishability of generated responses from actual user inputs based on historical context. Turing-RL demonstrates superior performance in user simulation tasks across conversational chat and Reddit discussions, suggesting that optimizing for indistinguishability rather than strict response matching can enhance the effectiveness of user simulators, which is crucial for developing more realistic AI agents.
user simulatorsreinforcement learningllm