Research
A Geometric Account of Activation Steering through Angle-Norm Decomposition
The paper presents a geometric analysis of activation steering in language models, emphasizing the importance of both angular alignment and hidden-state norm in steering interventions. Through empirical studies involving seven language models, it finds that concepts are predominantly represented in angular structure, advocating for spherical steering methods while noting that norm stability is crucial for effective steering. This work suggests a shift in parameterization for activation steering, promoting the use of distinct angular and radial components to enhance interpretability and control in model behavior.
activation steeringgeometryllm