SafetyarXiv cs.AI — 47 d ago

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

TruthRL is a novel reinforcement learning framework designed to enhance the truthfulness of large language models (LLMs) by optimizing for both accurate responses and appropriate abstention when uncertain. Implemented using Generalized Reward Policy Optimization (GRPO), TruthRL employs a ternary reward system that distinguishes between correct answers, hallucinations, and abstentions, leading to a significant reduction in hallucinations from 43.5% to 19.4% and an increase in truthfulness from 5.3% to 37.2% across four knowledge-intensive benchmarks. This approach is crucial for practitioners as it addresses the dual challenge of accuracy and uncertainty management in LLMs, enabling more reliable deployment in real-world applications.

truthfulnessllmreinforcement learningrelevance 0.70 · engagement 0.00

Read at source ↗← all news