Research
ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics
ComBench, a new benchmark for evaluating combinatorial reasoning in large language models, has been introduced, featuring 100 human-annotated Olympiad-level problems categorized into analysis-centric and construction-centric tasks. The evaluation methodology employs rubric-guided proof grading alongside deterministic construction verification, revealing that even top models like Kimi-K2.6 and GPT-5.5 exhibit significant performance gaps in rigorous proof reasoning and constructive realization. This benchmark highlights the need for improved capabilities in creative mathematical reasoning, essential for practitioners developing AI systems for complex problem-solving in combinatorics.
llmcombinatoricsbenchmark