Inference
Cross-Model Disagreement as a Label-Free Correctness Signal
The paper introduces a novel method for assessing the correctness of language model outputs without requiring ground truth labels, leveraging cross-model disagreement as a signal. The proposed metrics, Cross-Model Perplexity (CMP) and Cross-Model Entropy (CME), utilize a verifier model to evaluate the surprise and uncertainty of generated answers from a primary model, showing improved performance over traditional within-model uncertainty methods on benchmarks like MMLU, where CMP achieved a mean AUROC of 0.75 compared to 0.59 for the baseline. This approach offers a practical, training-free solution for enhancing deployment monitoring, model routing, and data filtering in language model applications.
llmcorrectnesscross-model