RAG
FusionRS: A Large-Scale RGB-Infrared Remote Sensing Dataset for Dual-Modal Vision-Language Foundation Models
FusionRS is introduced as the first large-scale RGB-infrared-text dataset aimed at enhancing dual-modal vision-language learning in remote sensing. It includes aligned RGB and infrared image pairs, each paired with conventional and infrared-specific captions, allowing for improved RGB-IR alignment and performance in tasks such as infrared-to-text retrieval and dual-modal captioning. This dataset facilitates the training of CLIP-style models and generative vision-language models, emphasizing the necessity of modality-specific textual supervision for effective RGB-infrared representation learning, which is crucial for practitioners developing advanced remote sensing applications.
remote sensingvision-languagedataset