Research
Ethical and Technical Limits of Deepfake Speech Datasets
A comprehensive audit of 39 deepfake speech datasets has been published, revealing significant limitations in their accessibility, documentation, and demographic coverage. The analysis highlights that most datasets lack essential demographic metadata, hindering fairness assessments and subgroup analyses, while also noting substantial overlap in bona fide source corpora, which may compromise the validity of cross-dataset evaluations. This work underscores the need for more robust dataset standards to enhance the reliability of deepfake speech detection systems for practitioners.
deepfakedatasetsfairness