Agents
MapDream: Task-Driven Map Learning for Vision-Language Navigation
MapDream introduces a novel map-in-the-loop framework for Vision-Language Navigation (VLN) that synthesizes bird's-eye-view (BEV) maps directly informed by navigation tasks, rather than relying on pre-constructed maps. This approach combines map generation with action prediction, resulting in a compact three-channel BEV map that emphasizes navigation-critical features. The model achieves state-of-the-art performance on R2R-CE and RxR-CE datasets, highlighting its potential to enhance navigation capabilities in AI agents by integrating learned map representations with navigation objectives.
navigationvision-languagemap learning