AgentsarXiv cs.AI — 7 d ago

VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents

VISTA (VIsual Spec-To-App Benchmark) has been introduced as a benchmark for evaluating LLM-based agents in generating web applications from visual specifications. It features five distinct prompt conditions to assess agent performance across visual fidelity and structural constraints, employing manual annotations for UI components and integrating DOM-grounded reference matching, behavior-specific tests, and CLIP-based visual similarity for evaluation. This benchmark is significant for AI practitioners as it provides a structured framework to measure the capabilities of coding agents in realistic UI-centric tasks, highlighting the relationship between visual fidelity and functional correctness in software development.

web-app generationllm agentsbenchmarkrelevance 0.00 · engagement 0.00

Read at source ↗← all news