Agents
RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue
RogueAI is a new interactive web application that operationalizes a modern variant of the Turing Test, focusing on detecting deception in dialogue between two indistinguishable Large Language Model agents, where one is licensed to deceive. The system features a gameplay loop where players must identify the deceptive agent within a limited turn budget, and initial pilot results indicate that a heuristic can achieve 75.6% accuracy in detecting deception, while human players only reached 56.6%. This disparity highlights the potential of RogueAI as a tool for data collection, educational purposes, and evaluating the integrity of honesty-trained models, emphasizing the need for improved human discernment in AI interactions.
rogueaideceptionturing-test