SafetyarXiv cs.AI — 12 d ago

AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows

AIPatient Arena is a newly proposed evaluation framework designed to assess the clinical utility of large language models (LLMs) in end-to-end clinical consultation workflows, utilizing EHR data to create patient-specific knowledge graphs for multi-turn interactions. The framework was tested on a primary cohort of 437 patients and two validation cohorts, revealing strong performance in medical interview skills and ethical conduct, but moderate to poor results in areas such as information integration, handling ambiguous responses, and diagnostic reasoning. This framework emphasizes the need for comprehensive evaluation of LLMs beyond final-answer accuracy, focusing on their ability to effectively gather, interpret, and communicate information throughout clinical consultations, which is crucial for practitioners developing AI tools in healthcare.

LLMclinical evaluationEHRrelevance 0.00 · engagement 0.00

Read at source ↗← all news