Research
Vernier: Probing Representational Misalignment Behind Lexical Gaps in Causal Reasoning
The paper introduces Vernier, a method for addressing representational misalignment in instruction-tuned language models when responding to causal reasoning questions with variable names replaced by placeholders. The study employs activation patching on models like Qwen-7B, Qwen-14B, and Llama-3.1-8B, revealing that a paired-view weight update can enhance accuracy in these scenarios, although success is influenced by model family, scale, and specific tasks. This research is significant as it provides insights into improving causal reasoning capabilities in LLMs by addressing lexical gaps and enhancing representation alignment.
causal reasoninglanguage modelsrepresentation