SafetyarXiv cs.AI — 16 d ago

TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction

The article introduces the Task-completion and Resistance to Active Privacy-extraction (TRAP) benchmark, designed to evaluate the trade-off between task accuracy and privacy leakage in AI agents handling sensitive information. The study assesses 22 models, revealing that all exhibit non-trivial information leakage, with a correlation between instruction-following capabilities and leakage rates. The authors propose a novel structural private field isolation method, which replaces sensitive data with hash keys, effectively mitigating leakage while maintaining task performance, addressing a critical challenge for practitioners in secure AI deployment.

privacyagentssecurityrelevance 0.00 · engagement 0.00

Read at source ↗← all news