Training
Heteroskedastic Signals in Budgeted LLM Verification: Structural Heterogeneity Limits Optimization Gains
This study presents findings on the limitations of using global uncertainty signals for budgeted verification in large language models (LLMs). It identifies that heteroskedasticity in uncertainty signals leads to suboptimal decision-making, with certain cost strata exhibiting poor discriminability. The proposed cost-stratified thresholding (CST) intervention demonstrates a significant improvement in hit rates by up to 17 percentage points, highlighting that structural heterogeneity is a critical factor affecting optimization gains in LLMs like Qwen3-8B, LLaMA3-8B, and GPT-4o-mini. This research underscores the importance of understanding signal quality across different contexts for practitioners aiming to enhance model performance.
budgeted verificationllmuncertainty