ResearcharXiv cs.AI — 15 d ago

How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural

The paper investigates the linearity of Transformer feed-forward networks (FFNs) by measuring their linear recoverability (R^2_lin) across different models, including GPT-2, Pythia-160m, and Llama-160m. Results show significant variability in linearity between blocks, with R^2_lin values ranging from near-linear to strongly nonlinear, indicating that linearity is a learned characteristic rather than an inherent architectural feature. This finding has implications for model compression and optimization, suggesting that recoverable blocks can be replaced with simpler architectures without substantial performance loss, while low-recoverability blocks may require careful handling to avoid degradation in model performance.

transformersfeed-forwardlinear-recoverabilitygptrelevance 0.00 · engagement 0.00

Read at source ↗← all news