Research
PSEBench: A Controllable and Verifiable Benchmark for Evaluating LLMs in Patient Safety Event Triage
PSEBench, a new benchmark for evaluating LLMs in patient safety event triage, has been introduced, comprising 5,074 cases derived from Minnesota's 29 Reportable Adverse Health Events. The benchmark utilizes a policy-grounded construction methodology that incorporates clause cards for auditable decision specifications and supports closed-loop verification, enabling LLMs to generate missing information and handle ambiguous cases. This development is significant for practitioners as it provides a structured framework to assess the reliability and effectiveness of LLMs in high-stakes clinical decision-making contexts.
llmpatient safetybenchmark