Research
Not All Skills Help: Measuring and Repairing Agent Knowledge
The paper introduces ASSAY, a framework designed to improve the curation of natural-language skills in LLM agents by measuring the causal contributions of individual skills across various tasks. By employing randomized masking, ASSAY identifies skills that negatively impact performance and restructures the skill library offline, leading to significant performance enhancements—such as a 47.4% relative improvement in task-goal completion for DeepSeek-V3 on AppWorld and an 8.7% relative improvement for GPT-4.1 on tau-bench—without necessitating weight updates. This approach highlights the importance of task-specific skill application, offering a more effective method for practitioners to optimize LLM performance.
agent knowledgeskillscausal contributions