TrainingarXiv cs.CL — 2 d ago

Spiking the training data to correct for test set contamination

This paper presents a novel approach to address test set contamination by proposing a method called "spiking," which involves intentionally contaminating training data with known rates of test examples to calibrate predictors of model memorization. The authors develop a simulation framework based on Hubble models to evaluate correction estimators, demonstrating that those leveraging both memorization and correctness information outperform naive approaches. This method offers a practical solution for practitioners by allowing effective calibration of test scores with minimal additional data, enhancing the reliability of model evaluations in contaminated scenarios.

testsetcontaminationcorrectionrelevance 0.00 · engagement 0.00

Read at source ↗← all news