Rational Sparse Autoencoder
The Rational Sparse Autoencoder (RSAE) introduces a novel approach to sparse autoencoders by replacing fixed nonlinearities with trainable rational functions, enhancing the flexibility in the reconstruction-versus-sparsity trade-off. The RSAE employs a two-stage pipeline that initializes with pre-trained weights and fine-tunes under a sparsity-regularized reconstruction objective, yielding improved performance on reconstruction and downstream metrics across various language models and activation families, while maintaining interpretability. This method allows practitioners to leverage enhanced model adaptability with minimal additional parameters and computational overhead, making it a valuable tool for mechanistic interpretability in AI applications.