BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection
BLUEmed is a novel multi-agent debate framework designed for detecting terminology substitution errors in clinical notes, utilizing a hybrid Retrieval-Augmented Generation (RAG) approach. It decomposes clinical notes into sub-queries, employs various retrieval methods, and incorporates two domain expert agents for independent analyses, resolving disagreements through structured counter-argumentation. Evaluated on a clinical terminology substitution benchmark, BLUEmed achieved a top accuracy of 69.13% and demonstrated that retrieval augmentation and structured debate enhance performance, particularly with models adept in instruction-following and clinical language comprehension, making it significant for practitioners focused on improving automated error detection in healthcare.