ai-digest.dev
last updated 4 h ago
AgentsarXiv cs.AI 10 d ago

MapDream: Task-Driven Map Learning for Vision-Language Navigation

MapDream introduces a novel map-in-the-loop framework for Vision-Language Navigation (VLN) that synthesizes bird's-eye-view (BEV) maps directly informed by navigation tasks, rather than relying on pre-constructed maps. This approach combines map generation with action prediction, resulting in a compact three-channel BEV map that emphasizes navigation-critical features. The model achieves state-of-the-art performance on R2R-CE and RxR-CE datasets, highlighting its potential to enhance navigation capabilities in AI agents by integrating learned map representations with navigation objectives.

navigationvision-languagemap learningrelevance 0.00 · engagement 0.00
Read at source ↗← all news
MapDream: Task-Driven Map Learning for Vision-Language Navigation — AI News Digest