ResearcharXiv cs.AI — 12 d ago

EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning

EngTrace is a newly introduced symbolic benchmark designed for evaluating the reasoning capabilities of Large Language Models (LLMs) in engineering contexts. It consists of 90 parameterized templates that generate 1,350 unique problem instances across various engineering domains, emphasizing the need for verifiable process supervision through a two-stage evaluation framework. The benchmark highlights a trade-off between numeric precision and reasoning trace fidelity in LLMs, indicating that traditional abstract mathematical training may not adequately prepare models for complex engineering tasks.

llmengineeringbenchmarkrelevance 0.00 · engagement 0.00

Read at source ↗← all news