Research
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling
UniT introduces a framework for multimodal chain-of-thought test-time scaling, enabling unified models to iteratively reason, verify, and refine outputs across multiple rounds. Key findings include that models trained on short reasoning trajectories can generalize to longer inference chains, and sequential reasoning is more compute-efficient than parallel sampling. This approach enhances both generation and understanding in unified multimodal architectures, addressing the challenges of complex tasks that require iterative corrections.
multimodalchain-of-thoughttest-time scaling