TrainingarXiv cs.CL — 11 d ago

BALTO: Balanced Token-Level Policy Optimization for Hallucination Mitigation

The article introduces BALTO, a Balanced Token-Level Policy Optimization framework aimed at mitigating hallucinations in large language models (LLMs). BALTO enhances reinforcement learning by employing a balanced token-level credit assignment mechanism that redistributes probability mass from unsupported to supported content, improving training stability and optimization efficiency. Experimental results on ConFiQA, RAGTruth, and FinLLM-Eval demonstrate that BALTO achieves superior faithfulness across multiple model-benchmark settings, outperforming existing post-training methods in maintaining a balance between faithfulness and informativeness.

hallucinationreinforcement-learningpolicy-optimizationrelevance 0.00 · engagement 0.00

Read at source ↗← all news