Research
Before the Labels: How Dataset Construction Shapes Suicidality Detection in Clinical Text
The article discusses the construction of the ScAN dataset, derived from MIMIC-III clinical notes, emphasizing how the design choices—such as governance constraints, ICD-based cohort selection, and single-annotator labeling—impact the operationalization of suicidality detection in clinical NLP. It highlights that the dataset's labels reflect clinician judgments and assumptions about intent, potentially oversimplifying the complexities of suicidality. This analysis urges practitioners to critically evaluate the assumptions underlying such datasets to avoid misinterpretation of the labels as definitive ground truth in clinical applications.
suicidality detectiondataset constructionclinical text