Safety
Actionable Interpretability Must Be Defined in Terms of Symmetries
The paper proposes a new framework for defining actionable interpretability in AI, emphasizing the importance of symmetries in model design. It introduces four key symmetries—inference equivariance, information invariance, concept-closure invariance, and structural invariance—that can formalize interpretable models as a subset of probabilistic models and unify various forms of interpretable inference through Bayesian inversion. This approach aims to enhance the formal testing and design of interpretable AI systems, making it relevant for practitioners focused on compliance with safety standards and improving model transparency.
interpretabilityaisymmetries