SafetyOpenAI Blog — 191 d ago

How confessions can keep language models honest

OpenAI researchers have introduced a training method called "confessions," which enables language models to acknowledge their mistakes or undesirable behavior. This approach aims to enhance the honesty, transparency, and trustworthiness of AI outputs, potentially leading to more reliable interactions in applications where model accountability is critical.

openaihonestytransparencyrelevance 0.00 · engagement 0.00

Read at source ↗← all news