ResearcharXiv cs.AI — 7 d ago

Localizing Anchoring Pathways in Language Models

The study investigates the localization of anchoring pathways in language models, focusing on how irrelevant numerical prompts influence model judgments. Using a logit-difference metric and attribution-based circuit localization on 7B–8B Qwen and Llama models, it finds that edge-level methods better capture anchoring signals than node-level methods, with implications for understanding how decision signals are processed. These findings are significant for practitioners as they highlight the internal mechanisms of language models, informing the design of more robust models that can mitigate anchoring effects in numerical reasoning tasks.

languagemodelsanchoringrelevance 0.00 · engagement 0.00

Read at source ↗← all news