ResearcharXiv cs.CL — 16 d ago

REDACT: A Systematically Controlled Multilingual Benchmark for Personal Information Detection

REDACT is a new multilingual benchmark for personally identifiable information (PII) detection, featuring 13,427 records, 324,078 entity annotations, and 51 entity types across 25 languages. It employs a strength-2 covering-array sampler to control nine generation axes, allowing for a nuanced evaluation of PII detection models, including Presidio, GLiNER, OpenAI Privacy Filter, GPT-4.1, and Claude Sonnet 4.6. This benchmark is significant for practitioners as it provides a structured way to assess model performance on high-stakes data and sensitivity tiers, highlighting the limitations of rule-based detectors compared to LLMs.

piibenchmarkmultilingualrelevance 0.00 · engagement 0.00

Read at source ↗← all news