Training
Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models
The paper introduces Causal Attribution Pruning (CAP), a novel training-free method for pruning large language models (LLMs) that identifies critical attention heads based on their causal impact on reasoning tasks. CAP demonstrates significant performance preservation, achieving up to 61% relative accuracy gains on the ARC-Challenge benchmark at 20% sparsity, and outperforms traditional pruning methods like Wanda across various models including Llama-3-8B-Instruct and Mistral-7B-Instruct. This approach highlights the importance of causal analysis in optimizing model efficiency while maintaining reasoning capabilities, which is crucial for practitioners aiming to reduce inference costs without sacrificing performance.
pruningLLMreasoning