Research
Interactions Between Crosscoder Features: A Compact Proofs Perspective
The paper presents a formalization of feature interactions in crosscoders, highlighting their impact on model performance through compact proofs. It introduces an interaction term for Multi-Layer Perceptron (MLP) layers, which can serve as a differentiable loss penalty, enabling "computationally sparse" crosscoders that maintain 60% of MLP performance with only one feature retained per data point, compared to 10% in conventional methods. This work is significant for practitioners as it provides new metrics and techniques for optimizing model efficiency while retaining performance, potentially enhancing interpretability and feature clustering in AI systems.
crosscoderfeature interactionproofs