Research
Is Code Better Than Language for Algorithmic Reasoning
The paper presents a study comparing natural-language reasoning and code-execution pipelines for algorithmic reasoning tasks. It introduces an intermediate intervention where the model expresses reasoning as executable code, demonstrating that deterministic code execution achieves a +31.6 percentage point improvement over natural-language reasoning across a 40-task benchmark. This suggests that reliable external execution is crucial for performance gains, and the findings are supported by a statistical decision-theoretic model that delineates conditions under which execution outperforms end-to-end risk management.
algorithmic reasoningcodellm