Training
MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs
The article presents MODE, a novel modality-decomposed expert-level mixed-precision quantization framework specifically designed for Mixture-of-Experts Multimodal Large Language Models (MoE-MLLMs). It addresses biases in expert importance estimation that arise from the dominance of vision tokens and redundant visual data, improving the selection process for quantization. Experimental results indicate that MODE achieves an average performance loss of only 2.9% at W3A16, with significant improvements at lower bit-widths, making it a valuable tool for practitioners seeking efficient quantization methods in multimodal AI applications.
quantizationmixed-precisionmoellmframework