Research▲ 2 · 0 cmts
ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity
The article introduces the Agentic Bio-Capabilities Benchmark (ABC-Bench), a new evaluation suite designed to assess large language models (LLMs) on biosecurity-relevant tasks, including operating liquid handling robots and designing DNA fragments. Notably, all tested LLM agents surpassed the median expert human performance on these tasks, with OpenAI's o4-mini-high successfully generating executable scripts for DNA assembly in wet-lab experiments. This benchmark is significant for practitioners as it highlights the advancing capabilities of LLMs in bioinformatics and the associated biosecurity implications, necessitating careful consideration in their deployment.
llmbiosecuritybenchmark