Coding
All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code
The paper presents an empirical study analyzing 86,156 test-file patches from agent-authored pull requests (PRs) across 2,807 GitHub repositories, produced by coding agents such as OpenAI Codex and GitHub Copilot. It identifies a syntactic taxonomy of eight oracle signal categories, revealing that 80.2% of test patches lack strong verification logic, which leads to an overestimation of their verification strength. The findings emphasize the importance of oracle-aware quality checks, as strong oracle signals significantly enhance the likelihood of PR merges (OR = 1.28, p < 0.001), guiding practitioners in better assessing the contributions of AI-generated code.
test-codeai-agentsverification