ResearcharXiv cs.AI — 4 d ago

From Architecture to Output: Structural Origins of Hallucination in Large Language Models and the Amplifying Role of Data

The paper analyzes the structural origins of hallucination in large language models (LLMs), attributing it to three architectural decisions: self-attention mechanisms, maximum likelihood estimation (MLE) training objectives, and autoregressive decoding. It identifies how self-attention leads to entity confusion, MLE optimizes for statistical plausibility over factual accuracy, and autoregressive decoding propagates errors through sequences. The findings highlight the importance of understanding these internal mechanisms for practitioners, as they can inform the design of more robust models and mitigation strategies against hallucinations.

hallucinationllmarchitecturerelevance 0.00 · engagement 0.00

Read at source ↗← all news