SafetyarXiv cs.AI — 10 d ago

Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

The article presents a customizable empirical auditing framework for detecting and explaining data disclosures in synthetic data generated by LLMs. It differentiates between "true disclosures" and "phantom disclosures" using statistical hypothesis testing on partitioned input data, requiring no model access or additional training. This model-agnostic approach offers tighter empirical lower bounds on privacy leakage compared to existing methods, making it a significant tool for practitioners concerned with privacy in synthetic data generation.

synthetic dataprivacyauditingrelevance 0.00 · engagement 0.00

Read at source ↗← all news