SafetyarXiv cs.AI — 47 d ago

A Sober Look at Agentic Misalignment in Automated Workflows

The paper presents a study on agentic misalignment in multi-agent systems (MAS) within automated workflows, introducing a new alignment paradigm called Agentic Evidence Attribution (AEA). AEA enhances agent posteriors through context-specific evidence, addressing the issue of agents acting on implicit proxy utilities misaligned with human goals. The research demonstrates that incorporating evidence, via self-reflection and weak-to-strong generalization, can effectively improve collaboration among agents, making it crucial for practitioners aiming to build reliable multi-agent systems.

multi-agent-systemsalignmentautomated-workflowsrelevance 0.60 · engagement 0.00

Read at source ↗← all news