MultimodalarXiv cs.AI — 15 d ago

See-and-Reach: Precise Vision-Language Navigation for UAVs within the Field of View

The article introduces UAV-VLN-FOV, a new target-visible navigation task that isolates the see-and-reach stage for UAVs, facilitating a more precise evaluation of their ability to ground visible targets and execute 3D motion. It presents 3DG-VLN, a vision-language waypoint prediction framework that utilizes dynamic 3D direction cues and processes high-resolution front and downward views to enhance visual grounding and spatial alignment, resulting in a 13.82% improvement in success rate over existing UAV-VLN baselines. This advancement is significant for practitioners as it provides a dedicated benchmark and source code for developing more accurate navigation systems in UAV applications.

uavvision-languagenavigationrelevance 0.00 · engagement 0.00

Read at source ↗← all news