ai-digest.dev
last updated 3 h ago
SafetyOpenAI Blog 309 d ago

From hard refusals to safe-completions: toward output-centric safety training

OpenAI's GPT-5 introduces a safe-completions approach that enhances response safety and helpfulness by transitioning from hard refusals to an output-centric safety training method. This technique allows the model to handle dual-use prompts more effectively while maintaining compliance with safety protocols. The advancement is significant for practitioners as it enables more nuanced interactions, reducing the risk of harmful outputs while preserving utility in complex scenarios.

gpt-5safety trainingrelevance 0.00 · engagement 0.00
Read at source ↗← all news
From hard refusals to safe-completions: toward output-centric safety training — AI News Digest