ModelsMarkTechPost — 14 d ago

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

OpenAI has released LifeSciBench, a benchmark comprising 750 expert-authored tasks designed to evaluate AI models on their capabilities in real-life scientific research across seven workflows and biological domains. The benchmark, developed by 173 PhD scientists with 19,020 rubric criteria, emphasizes reasoning and decision-making rather than mere recall, with the top-performing model, GPT-Rosalind, achieving a passing rate of only 36.1%, indicating significant potential for improvement in AI's performance in life sciences. This benchmark is crucial for practitioners as it provides a structured way to assess and enhance AI models in complex scientific contexts.

benchmarklifescibenchopenairelevance 0.00 · engagement 0.00

Read at source ↗← all news