ai-digest.dev
last updated 13 h ago
AgentsarXiv cs.AI 8 d ago

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

MobilityBench is a newly introduced benchmark designed to evaluate route-planning agents that utilize large language models (LLMs) in real-world mobility scenarios. It consists of large-scale, anonymized user queries from Amap, covering diverse routing intents across multiple cities, and features a deterministic API-replay sandbox for reproducible evaluations. The benchmark highlights that while LLMs perform well in basic tasks, they significantly underperform in preference-constrained route planning, indicating areas for further development in personalized mobility solutions. The benchmark data and tools are publicly available at GitHub.

route-planningllmrelevance 0.00 · engagement 0.00
Read at source ↗← all news
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios — AI News Digest