TrainingarXiv cs.CL — 11 d ago

CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

The paper introduces CoRA, a GRPO-based reinforcement learning framework designed to enhance confidence-rationale alignment in chain-of-thought (CoT) reasoning for large language models (LLMs). The framework jointly optimizes for answer correctness, committed-answer probability, and rationale support based on a rubric evaluating grounding, coherence, and task relevance. Results demonstrate a reduction in confidence-rationale alignment error by up to 26.51% across datasets like MedQA, MathQA, and OpenBookQA, underscoring the importance of coherent rationales in achieving reliable CoT reasoning for practitioners.

chain-of-thoughtreasoningllmrelevance 0.00 · engagement 0.00

Read at source ↗← all news