ModelsarXiv cs.AI — 21 h ago

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

The ASyMOB benchmark has been introduced, featuring a dataset of 35,368 validated symbolic math problems that include various mathematical domains such as integration and differential equations. This benchmark systematically applies perturbations to seed problems, allowing for a detailed evaluation of LLMs' generalization capabilities and robustness under minor changes. The findings indicate that while most models struggle with perturbations, integrated code tools enhance performance, and there are notable instances where LLMs outperform traditional Computer Algebra Systems, suggesting new avenues for hybrid approaches in AI-driven scientific discovery.

llmbenchmarksymbolic mathematicsrelevance 0.00 · engagement 0.00

Read at source ↗← all news