SafetyarXiv cs.AI — 7 d ago

Hidden in Plain Sight: Benchmarking Agent Safety Against Decomposition Attacks with DECOMPBENCH

The article announces the release of DeCompBench, a benchmark specifically designed to evaluate the safety of LLM-based agents against decomposition attacks, where harmful tasks are divided into benign subtasks. It highlights that existing safety evaluations do not adequately address this threat, and experiments reveal that state-of-the-art agents exhibit high refusal rates for monolithic harmful tasks but lower rates for decomposed tasks, often fulfilling adversarial objectives inadvertently. This benchmark provides critical insights for practitioners by emphasizing the need for enhanced safety assessments and defenses against decomposition attacks in AI systems.

agent-safetydecomposition-attacksbenchmarkrelevance 0.00 · engagement 0.00

Read at source ↗← all news