Multimodal
TerraMind: Large-Scale Generative Multimodality for Earth Observation
TerraMind is introduced as the first generative multimodal foundation model specifically designed for Earth observation, utilizing a dual-scale representation that integrates both token-level and pixel-level data across nine geospatial modalities. The model employs a dual-scale early fusion approach, enhancing its zero-shot and few-shot application capabilities, and introduces a "Thinking-in-Modalities" (TiM) feature for generating artificial data to optimize outputs. TerraMind outperforms existing benchmarks, such as PANGAEA, and its pretraining dataset, model weights, and code are openly available, making it a valuable resource for practitioners in the field.
earth observationgenerative models