Software Delegation Contracts: Measuring Reviewability in AI Coding-Agent Work
This paper presents a controlled pilot study analyzing the impact of explicit software delegation contracts on AI coding agents' performance and reviewability. The study utilized a TypeScript API environment with seeded defects and involved 64 executions across two model tiers, measuring outcomes under three conditions: realistic prompts, explicit contracts, and contracts requiring evidence bundles. Results indicated that while explicit contracts did not enhance task correctness—since all tasks passed acceptance tests—they significantly improved reviewability metrics, with evidence sufficiency increasing in 22 of 30 comparisons and reduced reviewer ambiguity, highlighting the potential for delegation contracts to facilitate better oversight in AI coding tasks despite additional resource costs.