Research
What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
This work presents an information-theoretic analysis of Latent Chain-of-Thought (CoT), addressing the challenges of robust latent reasoning due to weak outcome supervision. It introduces two supervision dimensions—Trajectory Supervision and Space Supervision—and proposes the Unified Latent Probe (ULP) to measure mutual information between latent trajectories and reasoning steps. The findings suggest a shift from geometric imitation to maximizing mutual information to enhance reasoning accuracy, providing a structured approach for practitioners in optimizing latent reasoning frameworks.
latent-coTsupervisioninformation-theoryreasoning