Multimodal
Cascaded Sparse Autoencoders Learn Multi-Level Visual Concepts in Multimodal LLMs
The paper introduces Cascaded Sparse Autoencoders (CSAEs), a novel architecture designed to enhance the interpretability of visual representations in Multimodal Large Language Models (MLLMs). CSAEs improve upon traditional Sparse Autoencoders by training a second-level SAE directly on the decoder weights of the first-level SAE, enabling the learning of hierarchical visual concepts without the limitations of existing methods. Experimental results on models like Qwen3-VL, Gemma-3, and LLaVA indicate that CSAEs achieve better hierarchical concept coherence and effective group-level interventions in MLLM outputs, which is crucial for practitioners aiming to enhance the interpretability and usability of AI systems in vision-language tasks.
visual conceptsautoencodersmlm