RAGarXiv cs.AI — 10 d ago

FusionRS: A Large-Scale RGB-Infrared Remote Sensing Dataset for Dual-Modal Vision-Language Foundation Models

FusionRS is introduced as the first large-scale RGB-infrared-text dataset aimed at enhancing dual-modal vision-language learning in remote sensing. It includes aligned RGB and infrared image pairs, each paired with conventional and infrared-specific captions, allowing for improved RGB-IR alignment and performance in tasks such as infrared-to-text retrieval and dual-modal captioning. This dataset facilitates the training of CLIP-style models and generative vision-language models, emphasizing the necessity of modality-specific textual supervision for effective RGB-infrared representation learning, which is crucial for practitioners developing advanced remote sensing applications.

remote sensingvision-languagedatasetrelevance 0.00 · engagement 0.00

Read at source ↗← all news