Safety
CmdNeedle: Measuring the Incompleteness of Command Denylists for AI Agents
The paper introduces CmdNeedle, an LLM-driven pipeline designed to systematically evaluate and repair the fragility of command denylists used by terminal AI agents. It reveals that 69.0–98.6% of the analyzed 1,709 real-world command denylists, containing 13,332 rules, exhibit significant vulnerabilities, undermining their intended security functions. This work is crucial for practitioners as it highlights the limitations of current denylist approaches and provides a framework for improving the robustness of command gating mechanisms in AI systems.
ai agentssecuritycommand