Agents
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
MobilityBench is a newly introduced benchmark designed to evaluate route-planning agents that utilize large language models (LLMs) in real-world mobility scenarios. It consists of large-scale, anonymized user queries from Amap, covering diverse routing intents across multiple cities, and features a deterministic API-replay sandbox for reproducible evaluations. The benchmark highlights that while LLMs perform well in basic tasks, they significantly underperform in preference-constrained route planning, indicating areas for further development in personalized mobility solutions. The benchmark data and tools are publicly available at GitHub.
route-planningllm