Research
In-Domain Supervised Pathology Report Classification: A Reproducible Pipeline from Data Curation to Production-Matched Evaluation
The article presents a reproducible supervised pipeline for classifying pathology reports, addressing the performance drop of biomedical NLP models when applied across different cancer registries. The proposed system utilizes facility-stratified sampling and a blinded manual audit to curate an in-domain training set and a production-matched holdout, achieving a false-negative rate (FNR) of 0.003 and an F1 score of 0.922 on a 418k-report dataset, significantly outperforming the baseline model. This methodology is crucial for practitioners as it enhances the reliability of NLP models in real-world clinical settings, particularly in managing label noise and improving classification accuracy.
biomedicalnlpclassification