Research
Rethinking Scaffolding in LLM Tutors: The Interactional Mismatch Between Benchmarks and Real-World Deployments
The paper presents a new evaluation pipeline for assessing scaffolding in AI educational chatbots, introducing two metrics: Chatbot Scaffolding and Student Uptake, applied across nine datasets totaling 9,490 chats. The findings indicate a significant discrepancy between benchmark assumptions of high scaffolding and actual student interactions, where students often bypass the chatbot's guidance to pursue their own learning objectives. This research underscores the need for future benchmarks to account for the variability in student engagement and the effectiveness of chatbots in real-world educational contexts.
llmtutoringscaffolding