Research
Jacobian Scopes: token-level causal attributions in LLMs
The article introduces Jacobian Scopes, a novel suite of gradient-based methods for token-level causal attribution in large language models (LLMs), aimed at elucidating how prior tokens influence predictions. This approach leverages perturbation theory and information geometry to assess the impact of input tokens on model outputs, including logits and uncertainty metrics. The release of open-source implementations and a cloud-hosted demo enables practitioners to explore these methods, which can enhance interpretability and address biases in language model predictions across various applications such as instruction understanding and translation.
llmcausalattribution