Training
Spiking the training data to correct for test set contamination
This paper presents a novel approach to address test set contamination by proposing a method called "spiking," which involves intentionally contaminating training data with known rates of test examples to calibrate predictors of model memorization. The authors develop a simulation framework based on Hubble models to evaluate correction estimators, demonstrating that those leveraging both memorization and correctness information outperform naive approaches. This method offers a practical solution for practitioners by allowing effective calibration of test scores with minimal additional data, enhancing the reliability of model evaluations in contaminated scenarios.
testsetcontaminationcorrection