ResearcharXiv cs.AI — 9 d ago

Not All Skills Help: Measuring and Repairing Agent Knowledge

The paper introduces ASSAY, a framework designed to improve the curation of natural-language skills in LLM agents by measuring the causal contributions of individual skills across various tasks. By employing randomized masking, ASSAY identifies skills that negatively impact performance and restructures the skill library offline, leading to significant performance enhancements—such as a 47.4% relative improvement in task-goal completion for DeepSeek-V3 on AppWorld and an 8.7% relative improvement for GPT-4.1 on tau-bench—without necessitating weight updates. This approach highlights the importance of task-specific skill application, offering a more effective method for practitioners to optimize LLM performance.

agent knowledgeskillscausal contributionsrelevance 0.00 · engagement 0.00

Read at source ↗← all news