The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content
The article introduces the concept of the "structural attention tax," which highlights how the format of retrieval-augmented generation (RAG) systems, particularly knowledge graph (KG) triples, can distort attention distribution in LLMs, capturing 2-3 times more attention per token compared to natural-language text. The authors present a formal framework for decomposing attention scores into semantic and structural components, revealing that the structural aspect can significantly compress demonstration attention by up to 42%, regardless of semantic relevance. This finding emphasizes the need for practitioners to optimize both retrieval quality and mitigate format-driven attention capture, as demonstrated by empirical results across Mistral-7B and LLaMA-3-8B models, which reveal a substantial performance gap in task-matched retrieval strategies.