Research
Multi-component Causal Tracing in Large Language Models
This paper introduces a unified framework for multi-component causal tracing in large language models (LLMs), allowing for simultaneous intervention on various internal components such as attention heads and multi-layer perceptron neurons. The proposed method employs an efficient algorithm that transforms the combinatorial search problem into a continuous one, enabling the identification of critical components affecting performance metrics like accuracy and fairness. This advancement is significant for practitioners seeking to optimize LLMs by understanding the causal relationships within model architectures.
causal-tracingllminternal-representations