ResearcharXiv cs.AI — 7 d ago

TerraBench: Can Agents Reason Over Heterogeneous Earth-System Data?

TerraBench, a new benchmark for grounded Earth-science reasoning, has been introduced to address the limitations of current models in integrating heterogeneous Earth-system data. It utilizes the TerraAgent framework, which combines reasoning, tool calls, and observations, enabling LLMs to interact with high-dimensional data. The benchmark features 403 tasks across various tracks and domains, emphasizing the need for agents to effectively coordinate workflows and maintain data provenance in scientific applications.

earth-sciencellmreasoningrelevance 0.00 · engagement 0.00

Read at source ↗← all news