CodingarXiv cs.AI — 8 d ago

CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

CODA-BENCH is introduced as the first benchmark designed to jointly evaluate code and data intelligence in data-intensive environments, addressing the limitations of existing benchmarks that assess code-centric or data-centric capabilities in isolation. It features a Linux sandbox based on the Kaggle ecosystem with 1,009 tasks across 31 communities, averaging 980 files per task, simulating realistic data complexity. The benchmark findings indicate that leading agents achieve only a 61.1% success rate in integrating data discovery with code execution, underscoring the need for advancements in agent capabilities for tackling data-intensive tasks.

benchmarkagentsdata-intensiverelevance 0.00 · engagement 0.00

Read at source ↗← all news