Safety
How confessions can keep language models honest
OpenAI researchers have introduced a training method called "confessions," which enables language models to acknowledge their mistakes or undesirable behavior. This approach aims to enhance the honesty, transparency, and trustworthiness of AI outputs, potentially leading to more reliable interactions in applications where model accountability is critical.
openaihonestytransparency