ResearcharXiv cs.AI — 14 d ago

CombEval: A Framework for Evaluating Combinatorial Counting in Large Language Models

CombEval is a newly introduced dynamic benchmark designed for evaluating combinatorial counting capabilities in large language models (LLMs). It utilizes a typed Cofola specification to systematically generate natural-language counting problems, allowing for controlled variations in object types and constraints. The evaluation of 11 LLMs reveals persistent weaknesses in handling ordered objects and nested dependencies, making CombEval a valuable tool for diagnosing and understanding the limitations of LLMs in combinatorial reasoning tasks.

benchmarkingcombinatorialllmrelevance 0.00 · engagement 0.00

Read at source ↗← all news