AgentsarXiv cs.AI — 4 d ago

CollabSkill: Evaluating Human-Agent Collaboration On Real-World Tasks

CollabSkill, a new framework for evaluating human-agent collaboration in real-world tasks, has been introduced to address the challenges of assessing occupational task performance with AI agents. It utilizes a Bayesian skill rating system to analyze data from over 1,500 prompts across 386 sessions with 93 human workers, revealing that Claude Code outperforms Codex in agent rankings, while practical experience significantly enhances human collaboration skills. This framework aims to facilitate systematic evaluation and development of AI agents that effectively augment human capabilities in the workplace.

human-agent collaborationevaluationreal-world tasksrelevance 0.00 · engagement 0.00

Read at source ↗← all news