Research
Rethinking Multimodal Fusion for Time Series: Text Modalities Need Constrained Fusion
The paper presents a novel approach to multimodal fusion for time series forecasting, introducing the Controlled Fusion Adapter (CFA) to improve the integration of text modalities. CFA utilizes low-rank adapters to filter out irrelevant textual information before fusion, addressing the limitations of naive fusion techniques such as addition or concatenation, which often perform worse than unimodal models. The authors conducted over 20,000 experiments across various datasets, demonstrating that their constrained fusion methods consistently yield better performance, highlighting the importance of controlled integration in multimodal learning for practitioners.
multimodal learningtime seriesfusion