Research
CleanPatrick: A Benchmark for Image Data Cleaning
CleanPatrick is a newly introduced large-scale benchmark for image data cleaning, utilizing the Fitzpatrick17k dermatology dataset, which includes 496,377 binary annotations from 933 medical crowd workers. The benchmark identifies off-topic samples, near-duplicates, and label errors, formalizing issue detection as a ranking task and employing standard ranking metrics for evaluation. This resource allows practitioners to systematically compare various image-cleaning strategies, highlighting the performance of self-supervised representations and classical anomaly detection methods in real-world scenarios.
data cleaningbenchmarkimage datamachine learning