Agents
MiroBench: Benchmarking Realism in Agentic Simulation of Real-world Discussions
MiroBench is a new benchmark introduced for evaluating the realism of LLM agents in simulating real-world discussions, specifically using 4,292 Reddit threads as a dataset. The benchmark assesses generated discussions against real ones across four dimensions: repetition and semantic uniformity, narrative content, toxicity and aggression, and structural complexity, revealing that current models exhibit significant distributional mismatches. This tool is essential for practitioners aiming to enhance the fidelity of LLM-based social simulations by providing structured metrics for comparison and improvement.
simulationbenchmarkllm agents