Research
Towards Verifiable Agentic Data Science: Solving Irregular TSQA Via Tool-Grounded Reasoning
The article announces the release of IRTS-ToolBench, a benchmark consisting of 1,700 questions across 10 task types and 13 domains, specifically designed for Time Series Question Answering (TSQA) under irregular conditions. It addresses the limitations of existing TSQA benchmarks that assume regular sampling, providing standardized inputs and a reproducible evaluation protocol for researchers focusing on LLM-based irregular time series analysis. This benchmark is significant for practitioners as it enables the assessment of AI models' performance in more realistic, asynchronous, and variable data environments.
data scienceagenticframework