Research
Recoverable but Not Stationary:Local Linear Structures in Weights and Activations
The paper presents findings on the local linear structures within the weights and activations of pretrained models, specifically using DistilGPT-2 and GPT-2 with LoRA adapters. It reveals that learned behaviors can be manipulated through linear directions, but these structures are dynamic rather than fixed, with the useful basis evolving significantly within a short training period. This work enhances understanding of parameter perturbations and activation steering, indicating that effective random parameter search can be justified in high-dimensional spaces, which is crucial for practitioners optimizing model performance.
linear structuresweightsactivation