ai-digest.dev
last updated 4 h ago
AgentsarXiv cs.AI 10 d ago

Binary Tracking for Spatial QA and Navigation with Open Vision-Language Models

The article presents BinTrack, an open-source spatial-localization agent designed for service robots to perform spatial question answering. It utilizes a binary search approach over trajectory segments, achieving up to 22.8% improved accuracy compared to existing open-source models and matching the performance of closed-source models like GPT-4o on the SpaceLocQA benchmark. Additionally, BinTrack offers over 1.5x inference speedup and introduces the GangnamLoop benchmark, which evaluates the system's performance in varied outdoor conditions, with all source codes and datasets available for public use.

spatial QAnavigationopen sourcerelevance 0.00 · engagement 0.00
Read at source ↗← all news
Binary Tracking for Spatial QA and Navigation with Open Vision-Language Models — AI News Digest