Research
SuperThoughts: Reasoning Tokens in Superposition
SuperThoughts introduces a novel approach to improve Long Chain-of-Thought (CoT) reasoning in large language models (LLMs) by compressing consecutive CoT tokens into single latent representations, allowing for dual token decoding via a Multi-Token Prediction (MTP) module. This method enhances inference throughput by 2x while preserving discrete token supervision during training, and it has been fine-tuned on models such as Qwen2.5-Math-1.5B-Instruct and evaluated against benchmarks like MATH500 and OlympiadBench, achieving a 20-30% reduction in CoT length with only a minimal accuracy drop. This advancement is significant for practitioners aiming to optimize LLM performance on complex reasoning tasks while managing computational costs.
llmreasoningsuperthoughts