Safety
OSGuard: A Benchmark for Safety in Computer-Use Agents
OSGuard is a newly introduced benchmark suite designed to evaluate the safety of computer-use agents by assessing their performance under unchanged user instructions. It features a dual-granularity approach, including an action-level benchmark for local guardrail decisions and a risk-augmented execution suite for end-to-end evaluation, allowing practitioners to identify unsafe completions that still achieve nominal task objectives. The findings indicate that while current multimodal guardrails excel in isolated action judgments, significant gaps remain in ensuring reliable end-to-end safety, highlighting the need for improved safety mechanisms in AI deployments.
safetyagentsbenchmark