Agents
GRACE-DS: a Guarded Reward-guided Agent Correction Environment in Data Science
GRACE-DS is a newly introduced evaluation environment designed for pre-deployment assessment of LLM-powered AutoML agents, focusing on tabular machine learning tasks. It incorporates metrics for measuring predictive performance, leakage avoidance, reproducibility, and reward alignment across realistic workflow stages. The platform, validated through over 7,000 episodes, demonstrates superior performance in end-to-end normalized hidden-test quality compared to existing baselines, making it a significant tool for practitioners seeking to evaluate the efficacy of LLM-based agents in production scenarios.
evaluationdata-scienceLLM