ResearcharXiv cs.AI — 8 d ago

Measuring Whether LLM Tutors Teach or Solve: A Diagnostic for Educational Impact

The article presents a diagnostic framework for evaluating the educational impact of large language models (LLMs) acting as tutors, emphasizing the distinction between task-solving ability and learning support. Analyzing results from the MathTutorBench leaderboard, the authors find a moderate correlation (0.421) between solving-oriented and pedagogy-oriented performance, indicating that models may excel in one area while underperforming in the other. This research underscores the need for separate reporting of solving and pedagogy scores in public benchmarks to better assess the educational efficacy of LLMs, advocating for criteria that prioritize student agency and active learning.

llmeducationtutoringrelevance 0.00 · engagement 0.00

Read at source ↗← all news