Research
EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models
The paper introduces Endogenous Chain-of-Thought (EndoCoT), a framework designed to enhance reasoning capabilities in Multimodal Large Language Models (MLLMs) integrated with diffusion models. It addresses limitations in reasoning depth and invariant guidance during decoding by implementing an iterative thought guidance module and a terminal thought grounding module, resulting in improved task execution. The framework achieved an average accuracy of 92.1% across various benchmarks, outperforming the strongest baseline by 8.3 percentage points, which is significant for practitioners looking to enhance the reasoning capabilities of AI systems in complex tasks.
chain-of-thoughtdiffusion-modelsreasoning