Agents
DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction
The article introduces DRFLOW, a benchmark designed to evaluate personalized workflow predictions by agents in complex information-seeking tasks, comprising 100 tasks across five domains with 1,246 reference workflow steps sourced from over 3,900 documents. It outlines seven diagnostic metrics for assessing performance, including factual grounding and step recovery, and presents DRFLOW-Agent (DRFA), which achieves up to a 10.02% improvement in average F1 score over baseline agents. This benchmark is significant for practitioners as it highlights the challenges in accurately predicting actionable workflows, emphasizing the need for advancements in deep research systems.
workflowpersonalizedpredictionbenchmark