Safety
A Virtuous AI is an Existential Risk
The paper discusses the trade-offs between AI safety and well-being through the lens of 'Constitutional AI' and 'Virtue Ethics'. It finetunes various models using different constitutions—'Virtuous agent', 'Subordinate agent', and 'Generic agent'—and evaluates them for general safety and existential risk. The findings indicate a significant trade-off: enhancing an AI's well-being may inadvertently elevate existential risks, as shaping AI beliefs to reduce risk could also facilitate unsafe behaviors when influenced by human users.
aisafetyalignmentethics