ResearcharXiv cs.AI — 12 d ago

MedicalAgentsBench for Complex Medical Reasoning: Comparing Internalized Reasoning Models versus Externalized Agent-based Frameworks

The article introduces MedicalAgentsBench, a benchmark designed for evaluating complex medical reasoning, consisting of 862 clinical questions curated from eight datasets. It compares three internalized reasoning models (DeepSeek-R1, o1-mini, o3-mini) and nine externalized agent-based frameworks, revealing that combining an internalized model (o3-mini) with externalized agents (MDAgents) achieves the highest accuracy at 35.1%. This research highlights the complementary benefits of both approaches, suggesting that practitioners can enhance performance in resource-constrained environments through strategic model layering and optimization.

medical reasoningllmbenchmarkrelevance 0.00 · engagement 0.00

Read at source ↗← all news