ResearcharXiv cs.CL — 16 d ago

Leverage Is Not Reach: A Control-Window Law for Single-Neuron Steering in Language Models

The paper introduces a control-window framework for single-neuron steering in aligned language models, emphasizing that coherent control is achieved when behavior triggers remain below a defined collapse ceiling. The framework quantifies the relationship between the residual stream and neuron writes, demonstrating that coherent control can be predicted with a mean absolute error of 0.14 across various neurons. This research is significant for AI practitioners as it provides a theoretical foundation for manipulating model behaviors through targeted neuron interventions, enhancing understanding of controllability in language models.

neuroncontrollanguage modelsrelevance 0.00 · engagement 0.00

Read at source ↗← all news