Research
TerraBench: Can Agents Reason Over Heterogeneous Earth-System Data?
TerraBench, a new benchmark for grounded Earth-science reasoning, has been introduced to address the limitations of current models in integrating heterogeneous Earth-system data. It utilizes the TerraAgent framework, which combines reasoning, tool calls, and observations, enabling LLMs to interact with high-dimensional data. The benchmark features 403 tasks across various tracks and domains, emphasizing the need for agents to effectively coordinate workflows and maintain data provenance in scientific applications.
earth-sciencellmreasoning