Safety
Order Is Not Control
The paper presents a theoretical framework distinguishing between order and control in AI systems, emphasizing that control necessitates a receiver-gated response law which maps various states and actions to response displacements. It provides empirical evidence from biological models and large language models (LLMs), demonstrating that response vectors can be predicted with significant accuracy (up to 84.8% for nonzero components). This research is crucial for practitioners as it underlines the importance of understanding local control mechanisms and stochastic response operators when designing and implementing AI systems, particularly in enhancing the interpretability and alignment of LLMs.
ai alignmentinterpretability