Multimodal
Reasoning for Mobile User Experience with Multimodal LLMs: Task, Benchmark, and Approach
The article introduces UXBench, a novel multimodal benchmark with 2,000 visual question answering (VQA) samples designed to evaluate large language models' (LLMs) reasoning capabilities in user interface (UI) contexts. It presents UI-UX, an enhanced model based on the Qwen3-VL-4B-Thinking foundation model, which incorporates a reward routing mechanism and an asymmetric transition reward to improve reasoning efficiency. UI-UX achieves state-of-the-art performance with an accuracy of 0.7963 on UXBench, outperforming Claude-4.5-Sonnet, emphasizing the need for improved UX evaluation methods in multimodal applications.
uxmlmbenchmark