ai-digest.dev
last updated 13 h ago
AgentsarXiv cs.AI 7 d ago

EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

EpiBench is a newly introduced benchmark designed for evaluating AI agents in short-horizon epigenomics analysis, featuring 106 evaluations across various workflows such as CUT&Tag/CUT&RUN, ATAC-seq, ChIP-seq, and DNA methylation. The benchmark tested 5,088 trajectories from 16 model-harness pairs, with the highest performance from GPT-5.5 / Pi achieving a success rate of 45.0%, indicating that while agents can retrieve relevant data, they struggle with tasks requiring deeper scientific judgment. This benchmark highlights the limitations of current AI models in performing complex, domain-specific analyses, which is crucial for practitioners developing AI applications in epigenomics.

benchmarkepigenomicsevaluationrelevance 0.00 · engagement 0.00
Read at source ↗← all news
EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis — AI News Digest