Research
EHRNote-ChatQA: A Benchmark for Evidence-Grounded Multi-Turn Clinical Question Answering over Longitudinal Discharge Summaries
EHRNote-ChatQA is introduced as a novel benchmark for evidence-grounded multi-turn clinical question answering, specifically utilizing longitudinal discharge summaries from the MIMIC-IV dataset. It comprises 967 patient-level multi-turn samples and 16,072 medical-expert-verified QA pairs across eight clinical categories, highlighting the challenges faced by LLMs, such as difficulties with evidence grounding and compounded multi-turn errors. This benchmark is significant for practitioners as it provides a more relevant evaluation framework for developing clinical QA systems that require contextual understanding over multiple documents.
clinicalquestion answeringEHR