TrainingarXiv cs.CL — 16 d ago

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

The paper introduces Causal Attribution Pruning (CAP), a novel training-free method for pruning large language models (LLMs) that identifies critical attention heads based on their causal impact on reasoning tasks. CAP demonstrates significant performance preservation, achieving up to 61% relative accuracy gains on the ARC-Challenge benchmark at 20% sparsity, and outperforms traditional pruning methods like Wanda across various models including Llama-3-8B-Instruct and Mistral-7B-Instruct. This approach highlights the importance of causal analysis in optimizing model efficiency while maintaining reasoning capabilities, which is crucial for practitioners aiming to reduce inference costs without sacrificing performance.

pruningLLMreasoningrelevance 0.00 · engagement 0.00

Read at source ↗← all news