ResearcharXiv cs.AI — 4 d ago▲ 2 · 0 cmts

ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity

The article introduces the Agentic Bio-Capabilities Benchmark (ABC-Bench), a new evaluation suite designed to assess large language models (LLMs) on biosecurity-relevant tasks, including operating liquid handling robots and designing DNA fragments. Notably, all tested LLM agents surpassed the median expert human performance on these tasks, with OpenAI's o4-mini-high successfully generating executable scripts for DNA assembly in wet-lab experiments. This benchmark is significant for practitioners as it highlights the advancing capabilities of LLMs in bioinformatics and the associated biosecurity implications, necessitating careful consideration in their deployment.

llmbiosecuritybenchmarkrelevance 0.80 · engagement 0.06

Read at source ↗HN discussion ← all news