ProductsarXiv cs.AI — 9 d ago

UXBench: Measuring the Actionability of LLM-Generated UX Critiques

UXBench is a newly introduced benchmark designed to evaluate the actionability of UX critiques generated by large language models (LLMs). It features local-first runnable web fixtures across ten product-surface families and employs coverage-gated browser exploration to ensure models gather interaction evidence before reporting. The evaluation of eight advanced models reveals significant variability in report actionability and reliability, highlighting the need for tailored approaches in deploying LLMs for UX assessment.

uxllmbenchmarkrelevance 0.00 · engagement 0.00

Read at source ↗← all news