Safety
A Sober Look at Agentic Misalignment in Automated Workflows
The paper presents a study on agentic misalignment in multi-agent systems (MAS) within automated workflows, introducing a new alignment paradigm called Agentic Evidence Attribution (AEA). AEA enhances agent posteriors through context-specific evidence, addressing the issue of agents acting on implicit proxy utilities misaligned with human goals. The research demonstrates that incorporating evidence, via self-reflection and weak-to-strong generalization, can effectively improve collaboration among agents, making it crucial for practitioners aiming to build reliable multi-agent systems.
multi-agent-systemsalignmentautomated-workflows