ai-digest.dev
last updated 13 h ago
SafetyarXiv cs.AI 7 d ago

Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

The paper introduces MT-AgentRisk, a benchmark designed to assess the safety of multi-turn interactions in tool-using agents, revealing a 16% increase in Attack Success Rate (ASR) for such scenarios compared to single-turn tasks. To mitigate these risks, the authors propose ToolShield, a training-free, tool-agnostic defense mechanism that enables agents to autonomously generate and test scenarios, successfully reducing ASR by 30% in multi-turn contexts. This work highlights the necessity for robust safety measures in advanced LLM-based agents as they become more capable in complex interactions.

llmsafetymulti-turnrelevance 0.00 · engagement 0.00
Read at source ↗← all news
Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents — AI News Digest