ai-digest.dev
last updated 5 h ago
ResearcharXiv cs.AI 21 h ago

PSEBench: A Controllable and Verifiable Benchmark for Evaluating LLMs in Patient Safety Event Triage

PSEBench, a new benchmark for evaluating LLMs in patient safety event triage, has been introduced, comprising 5,074 cases derived from Minnesota's 29 Reportable Adverse Health Events. The benchmark utilizes a policy-grounded construction methodology that incorporates clause cards for auditable decision specifications and supports closed-loop verification, enabling LLMs to generate missing information and handle ambiguous cases. This development is significant for practitioners as it provides a structured framework to assess the reliability and effectiveness of LLMs in high-stakes clinical decision-making contexts.

llmpatient safetybenchmarkrelevance 0.00 · engagement 0.00
Read at source ↗← all news