AgentsarXiv cs.AI — 8 d ago

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

MobilityBench is a newly introduced benchmark designed to evaluate route-planning agents that utilize large language models (LLMs) in real-world mobility scenarios. It consists of large-scale, anonymized user queries from Amap, covering diverse routing intents across multiple cities, and features a deterministic API-replay sandbox for reproducible evaluations. The benchmark highlights that while LLMs perform well in basic tasks, they significantly underperform in preference-constrained route planning, indicating areas for further development in personalized mobility solutions. The benchmark data and tools are publicly available at GitHub.

route-planningllmrelevance 0.00 · engagement 0.00

Read at source ↗← all news