ai-digest.dev
last updated 13 h ago
ResearcharXiv cs.AI 4 d ago

ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics

ComBench, a new benchmark for evaluating combinatorial reasoning in large language models, has been introduced, featuring 100 human-annotated Olympiad-level problems categorized into analysis-centric and construction-centric tasks. The evaluation methodology employs rubric-guided proof grading alongside deterministic construction verification, revealing that even top models like Kimi-K2.6 and GPT-5.5 exhibit significant performance gaps in rigorous proof reasoning and constructive realization. This benchmark highlights the need for improved capabilities in creative mathematical reasoning, essential for practitioners developing AI systems for complex problem-solving in combinatorics.

llmcombinatoricsbenchmarkrelevance 0.00 · engagement 0.00
Read at source ↗← all news
ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics — AI News Digest