Research
TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology
The article introduces TxBench-PP, a benchmark designed to evaluate AI agents' performance in small-molecule preclinical pharmacology, featuring 100 evaluations across various drug discovery stages. It assesses agents on their ability to draw accurate conclusions from real-world assay data, with 16 model configurations tested, including Claude Opus 4.8 and GPT-5.5, where the best performance achieved was 59.3% accuracy in decision recovery. This benchmark is significant for AI practitioners as it provides a structured evaluation framework to ensure AI models can effectively interpret complex pharmacological data, enhancing their reliability in drug discovery applications.
drug discoveryai agentsbenchmark