Research
Redact or Keep? A Fully Local AI Cascade for Educational Dialogue De-Identification
The article presents a fully local cascade framework for educational dialogue de-identification, which addresses the challenge of balancing privacy and accuracy in handling personally identifiable information (PII) in transcripts. The proposed system utilizes a recall-first union proposer with two lightweight encoders and a context-aware reviewer, achieving a macro F1 score of 0.958 on math tutoring transcripts, outperforming both a same-family LLM-only baseline (0.767) and a commercial API (0.706). This work highlights the importance of problem formulation over model scale, offering a viable solution for practitioners needing to maintain data governance while processing sensitive educational content.
de-identificationeducationllm