Agents
WorldReasoner: Evaluating Whether Language Model Agents Forecast Events with Valid Reasoning
WorldReasoner is an evaluation framework designed to assess the capability of language model agents in forecasting real-world events under uncertainty, emphasizing the importance of reasoning quality over mere accuracy. It features a structured task setup with 345 resolved forecasting tasks derived from over 14,000 articles, scoring agents on outcome quality, evidence quality, and reasoning quality. This framework highlights that while temporally valid retrieval significantly enhances accuracy, agents face challenges in transforming grounded evidence into calibrated probabilities, providing critical insights for practitioners focused on improving LLM forecasting capabilities.
forecastinglanguage modelreasoning