AgentsarXiv cs.AI — 21 h ago

SkillResolve-Bench: Measuring and Resolving Same-Capability Ambiguity in Agent Skill Retrieval

SkillResolve-Bench 1.0 is introduced as a benchmark for evaluating agent skill retrieval, addressing the issue of same-capability execution-risk retrieval by providing 661 helpful/risky skill pairs and a candidate pool of 7,982. The benchmark measures helpful ranking alongside harmful sibling rate (HSR@K) and demonstrates that the SkillResolve method achieves Recall@3 of 0.766 and NDCG@3 of 0.699, significantly enhancing performance over the previous SkillRouter by improving recall and reducing harmful exposure. This work is crucial for practitioners as it offers a structured approach to mitigate risks in skill retrieval systems, ensuring safer and more reliable agent execution.

agentsskill retrievalbenchmarkrelevance 0.00 · engagement 0.00

Read at source ↗← all news