TrainingarXiv cs.CL — 14 d ago

Learning User Simulators with Turing Rewards

The article introduces Turing-RL, a novel reinforcement learning approach for training user simulators that utilizes a Turing-Test-based reward mechanism. This method employs a discriminative Turing reward, where an LLM judge assesses the indistinguishability of generated responses from actual user inputs based on historical context. Turing-RL demonstrates superior performance in user simulation tasks across conversational chat and Reddit discussions, suggesting that optimizing for indistinguishability rather than strict response matching can enhance the effectiveness of user simulators, which is crucial for developing more realistic AI agents.

user simulatorsreinforcement learningllmrelevance 0.00 · engagement 0.00

Read at source ↗← all news