Coding
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks
Claw-SWE-Bench is a newly introduced benchmark designed to evaluate OpenClaw-style agent harnesses on coding tasks, featuring 350 instances across 8 programming languages and 43 repositories. It includes a full benchmark and a faster validation subset, Claw-SWE-Bench Lite, with significant performance differences observed based on adapter design, as shown by OpenClaw's Pass@1 scores of 19.1% with a minimal adapter versus 73.4% with a full adapter. This benchmark emphasizes the importance of harness design and cost accounting in assessing coding agents, providing a standardized framework for practitioners in the AI coding domain.
benchmarkagentscoding tasks