Research
A Systematic Evaluation of Black-Box Uncertainty Estimation Methods for Large Language Models
This article presents a systematic evaluation of black-box uncertainty estimation (UE) methods for large language models (LLMs), addressing the need for reliable outputs amidst the challenges of restricted API access. It categorizes 24 UE methods into five types and benchmarks them across four models and datasets, revealing that no single method excels universally, but hybrid approaches that leverage multiple uncertainty signals tend to perform better. The release of a unified evaluation framework and benchmark data aims to enhance reproducibility and guide practitioners in developing robust black-box UE methods for LLMs.
uncertaintyllmblack-box