Multimodal
UrbanWell: Benchmarking Multimodal Large Language Models for Spatio-Temporal Urban Wellbeing Analytics
UrbanWell is a newly introduced benchmark aimed at evaluating the spatio-temporal reasoning capabilities of multimodal large language models (MLLMs) for urban wellbeing analytics. It encompasses data from 38 cities over multiple years, integrating diverse indicators related to environmental conditions, spatial accessibility, urban form, vitality, and subjective perceptions, all standardized at a grid level. The benchmark assesses 15 state-of-the-art MLLMs in a zero-shot setting, revealing varying performance across different urban indicators, thus providing a critical resource for future research in multimodal urban intelligence and analytics.
mllmurbanbenchmark