Training
CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning
The paper introduces CoRA, a GRPO-based reinforcement learning framework designed to enhance confidence-rationale alignment in chain-of-thought (CoT) reasoning for large language models (LLMs). The framework jointly optimizes for answer correctness, committed-answer probability, and rationale support based on a rubric evaluating grounding, coherence, and task relevance. Results demonstrate a reduction in confidence-rationale alignment error by up to 26.51% across datasets like MedQA, MathQA, and OpenBookQA, underscoring the importance of coherent rationales in achieving reliable CoT reasoning for practitioners.
chain-of-thoughtreasoningllm