ResearcharXiv cs.AI — 4 d ago

Interactions Between Crosscoder Features: A Compact Proofs Perspective

The paper presents a formalization of feature interactions in crosscoders, highlighting their impact on model performance through compact proofs. It introduces an interaction term for Multi-Layer Perceptron (MLP) layers, which can serve as a differentiable loss penalty, enabling "computationally sparse" crosscoders that maintain 60% of MLP performance with only one feature retained per data point, compared to 10% in conventional methods. This work is significant for practitioners as it provides new metrics and techniques for optimizing model efficiency while retaining performance, potentially enhancing interpretability and feature clustering in AI systems.

crosscoderfeature interactionproofsrelevance 0.00 · engagement 0.00

Read at source ↗← all news