SafetyarXiv cs.AI — 7 d ago

Capability Minimization as a Safety Primitive: Risk-Aware Causal Gating for Least-Privilege LLM Agents

The article presents Risk-Aware Causal Gating (RACG), a framework designed to enhance decision-making in learned systems by integrating causal effect estimation with calibrated risk control. RACG employs distribution-free bounds to determine whether to act on a model's predictions based on estimated counterfactual risks, rather than raw confidence levels, effectively reducing costly errors while maintaining utility. This approach offers a structured method for improving safety and transparency in automated decision systems, particularly in high-stakes environments where reliable performance is critical.

risk controlcausal gatingllm agentsrelevance 0.00 · engagement 0.00

Read at source ↗← all news