AgentsarXiv cs.AI — 10 d ago

MapDream: Task-Driven Map Learning for Vision-Language Navigation

MapDream introduces a novel map-in-the-loop framework for Vision-Language Navigation (VLN) that synthesizes bird's-eye-view (BEV) maps directly informed by navigation tasks, rather than relying on pre-constructed maps. This approach combines map generation with action prediction, resulting in a compact three-channel BEV map that emphasizes navigation-critical features. The model achieves state-of-the-art performance on R2R-CE and RxR-CE datasets, highlighting its potential to enhance navigation capabilities in AI agents by integrating learned map representations with navigation objectives.

navigationvision-languagemap learningrelevance 0.00 · engagement 0.00

Read at source ↗← all news