MultimodalarXiv cs.AI — 15 d ago

TerraMind: Large-Scale Generative Multimodality for Earth Observation

TerraMind is introduced as the first generative multimodal foundation model specifically designed for Earth observation, utilizing a dual-scale representation that integrates both token-level and pixel-level data across nine geospatial modalities. The model employs a dual-scale early fusion approach, enhancing its zero-shot and few-shot application capabilities, and introduces a "Thinking-in-Modalities" (TiM) feature for generating artificial data to optimize outputs. TerraMind outperforms existing benchmarks, such as PANGAEA, and its pretraining dataset, model weights, and code are openly available, making it a valuable resource for practitioners in the field.

earth observationgenerative modelsrelevance 0.00 · engagement 0.00

Read at source ↗← all news