ResearcharXiv cs.AI — 8 d ago

Recurrent Reasoning on Symbolic Puzzles with Sequence Models

The authors introduce RecurrReason, a benchmark for evaluating reasoning in symbolic puzzles, featuring 10,817 unique puzzles across four types with a difficulty parameter ranging from 1 to 10. They benchmark two Transformer architectures, T5 (encoder-decoder) and GPT-2 (decoder-only), highlighting that fine-tuned T5 achieves 97.27% validation accuracy but struggles with out-of-distribution tasks, notably scoring 0% on River Crossing. This work underscores the importance of architectural choices over model scale in performance on reasoning tasks, which is critical for practitioners developing robust AI systems.

reasoningsymbolic puzzlessequence modelsrelevance 0.00 · engagement 0.00

Read at source ↗← all news