Safety
Disentangling Hallucinations: Orthogonal Semantic Projection for Robust Interpretability
The article presents a new framework called Linear Semantic Attribution (LSA) aimed at improving the interpretability of Vision-Language Models by addressing the issue of semantic hallucination. It introduces Orthogonal Semantic Projection (OSP), a geometric intervention that minimizes hallucination by orthogonalizing the query vector against distractor concepts, thus enhancing the fidelity of attribution models in high-dimensional embedding spaces. This work is significant for practitioners as it provides a theoretical foundation and practical solution to improve the reliability of explanations in safety-critical AI applications.
explainable aisemantic hallucinationxai