arXiv cs.CL June 30, 2026 · Papers

Turn-Averaged SAEs for Feature Discovery and Long-Context Attribution

arXiv:2606.28548v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) have become a useful tool for extracting interpretable features in language models. However, standard SAE architectures operate on individual token activations, meaning that the number of active features scales linearly with context length, and

Read original