ai-digest.dev
last updated 13 h ago
MultimodalarXiv cs.AI 7 d ago

Reasoning for Mobile User Experience with Multimodal LLMs: Task, Benchmark, and Approach

The article introduces UXBench, a novel multimodal benchmark with 2,000 visual question answering (VQA) samples designed to evaluate large language models' (LLMs) reasoning capabilities in user interface (UI) contexts. It presents UI-UX, an enhanced model based on the Qwen3-VL-4B-Thinking foundation model, which incorporates a reward routing mechanism and an asymmetric transition reward to improve reasoning efficiency. UI-UX achieves state-of-the-art performance with an accuracy of 0.7963 on UXBench, outperforming Claude-4.5-Sonnet, emphasizing the need for improved UX evaluation methods in multimodal applications.

uxmlmbenchmarkrelevance 0.00 · engagement 0.00
Read at source ↗← all news
Reasoning for Mobile User Experience with Multimodal LLMs: Task, Benchmark, and Approach — AI News Digest