Research
Marginal Alignment Does Not Guarantee Joint-Distribution Fidelity: An Official-Reference Audit of Nemotron-Personas-Korea with Cross-Locale Replication
The article presents an audit of the NVIDIA Nemotron-Personas-Korea (NPK), a synthetic persona dataset, using a new method called the Independence-Assumption Footprint (IAF) to assess joint-distribution fidelity. The IAF revealed discrepancies between the synthetic personas and official demographic references, including mismatches in major-by-occupation distributions and inconsistencies in age profiles related to military service. This study emphasizes the necessity for joint audits alongside marginal alignment claims in synthetic datasets, and it releases various audit artifacts to facilitate similar evaluations on other persona resources, which is crucial for ensuring the reliability of synthetic data in AI applications.
auditdatasetsalignment