Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal
The article introduces UniCR, a unified framework designed to enhance confidence calibration and enable risk-controlled refusal in large language models (LLMs). UniCR integrates various uncertainty evidence types, including sequence likelihoods and feedback from tools, to produce a calibrated probability of correctness while adhering to a user-defined error budget. Key technical features include a lightweight calibration head with temperature scaling, support for API-only models, and improved performance in calibration metrics and risk-coverage curves across tasks like short-form QA and retrieval-augmented long-form QA, making it a valuable tool for practitioners aiming to improve trustworthiness in LLM deployments without requiring base model fine-tuning.