ai-digest.dev
last updated 5 h ago
ResearcharXiv cs.AI 21 h ago

A Geometric Account of Activation Steering through Angle-Norm Decomposition

The paper presents a geometric analysis of activation steering in language models, emphasizing the importance of both angular alignment and hidden-state norm in steering interventions. Through empirical studies involving seven language models, it finds that concepts are predominantly represented in angular structure, advocating for spherical steering methods while noting that norm stability is crucial for effective steering. This work suggests a shift in parameterization for activation steering, promoting the use of distinct angular and radial components to enhance interpretability and control in model behavior.

activation steeringgeometryllmrelevance 0.00 · engagement 0.00
Read at source ↗← all news