Research
Operadic consistency: a label-free signal for compositional reasoning failures in LLMs
The paper introduces "operadic consistency" (OC), a novel diagnostic signal for identifying reasoning failures in large language models (LLMs) during inference without relying on ground-truth labels. Evaluated across twelve instruction-tuned LLMs ranging from 4B to 671B parameters on four multi-hop QA datasets, OC demonstrates a strong correlation with accuracy (Pearson $r \in [0.86, 0.94]$), outperforming existing methods like chain-of-thought self-consistency. This approach not only provides additional information for improving selective-prediction accuracy but also enhances performance metrics significantly, making it a valuable tool for practitioners aiming to refine LLM reasoning capabilities.
operadic-consistencyreasoning-failuresllm