ai-digest.dev
last updated 2 h ago
ResearcharXiv cs.AI 9 d ago

EHRNote-ChatQA: A Benchmark for Evidence-Grounded Multi-Turn Clinical Question Answering over Longitudinal Discharge Summaries

EHRNote-ChatQA is introduced as a novel benchmark for evidence-grounded multi-turn clinical question answering, specifically utilizing longitudinal discharge summaries from the MIMIC-IV dataset. It comprises 967 patient-level multi-turn samples and 16,072 medical-expert-verified QA pairs across eight clinical categories, highlighting the challenges faced by LLMs, such as difficulties with evidence grounding and compounded multi-turn errors. This benchmark is significant for practitioners as it provides a more relevant evaluation framework for developing clinical QA systems that require contextual understanding over multiple documents.

clinicalquestion answeringEHRrelevance 0.00 · engagement 0.00
Read at source ↗← all news
EHRNote-ChatQA: A Benchmark for Evidence-Grounded Multi-Turn Clinical Question Answering over Longitudinal Discharge Summaries — AI News Digest