Agentic Retrieval and Reinforcement Learned Equation Chains: A Controlled Generation Framework for Complex and Novel Physics Word Problems
The article presents ARVRE (Agentic Retrieval Value Reinforced Equation-chain), a two-stage framework designed to generate complex and novel Physics Word Problems (PWPs) that are mathematically valid. The first stage employs offline temporal-difference learning to create valid chains of physics equations, while an agentic retrieval-augmented generation (RAG) approach selects relevant concepts and vocabulary. The second stage utilizes a Large Language Model (LLM) to convert these elements into natural-language questions, resulting in PWPs that exhibit greater complexity and solvability compared to previous methods, demonstrating the efficacy of integrating reinforcement learning, retrieval mechanisms, and LLMs in educational content generation.