Safety
A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks
The paper presents an audit of pretraining contamination in public medical vision-language models (VLMs), specifically evaluating benchmarks such as SLAKE-En and PathVQA. It identifies significant image-side source overlap in SLAKE-En, with 19.8% of images flagged for contamination, and highlights issues in the reliability of cohort-relative detectors for membership inference. This work is crucial for practitioners as it underscores the potential biases in benchmark evaluations, impacting the reliability of model performance assessments in medical applications.
medicalvision-languageaudit