LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data
The study introduces a method for assessing epistemic blind spots in large language models (LLMs) when applied to structured clinical data, specifically comparing Qwen 2.5 (7B parameters) with XGBoost using cross-model attribution divergence. Key findings include the LLM's consistent yet misleading confidence scores, a notable inverse difficulty effect in prediction accuracy, and the effectiveness of few-shot examples and SHAP-derived feature evidence in improving performance and reducing Attribution Disagreement Score (ADS). The proposed cross-model calibrator enhances LLM reliability by replacing standard confidence metrics with patient-specific estimates, thereby addressing the cold start problem and paving the way for improved epistemic self-awareness in LLMs.