Size Doesn't Matter: Cosine-Scored Sparse Autoencoders

2026-06-13 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Cosine-Scored Sparse Autoencoders (SAEs) address a limitation in standard SAEs where feature activation, based on inner product, scales with both directional alignment and input norm. This is problematic because sublayer normalization in models discards magnitude, causing standard SAEs to detect a quantity the model doesn't use, leading to wasted dictionary slots on "norm detectors." The proposed method replaces the inner product score with a learned blend of cosine similarity and input magnitude, allowing the optimizer to determine the optimal norm usage, either globally or per-feature. Experiments show that the optimizer consistently chooses less than half-magnitude dependence. At matched reconstruction, the cosine encoder learns features that align with human-recognizable concepts more frequently than standard SAEs, efficiently utilizing dictionary slots. The forward-pass score geometry is identified as the primary lever for this improvement, suggesting cosine scoring as a default for dictionary learning on normalized representations, despite its non-universal advantage across all tasks or depths.

Key takeaway

For Machine Learning Engineers developing sparse autoencoders on normalized representations, you should consider adopting cosine scoring as the default. This approach significantly improves feature interpretability and dictionary slot utilization by decoupling feature activation from input magnitude, which is often discarded by sublayer normalization. Implementing this can lead to more meaningful and human-recognizable learned features, even if its advantage isn't universal across all tasks.

Key insights

Cosine-scored sparse autoencoders improve feature learning by decoupling activation from input norm, aligning features with human concepts.

Principles

Standard SAEs waste dictionary slots on norm detection.
Sublayer normalization discards magnitude, making norm-dependent scoring inefficient.
Learned cosine-magnitude blend improves feature interpretability.

Method

Replace the standard inner product score in SAEs with a learned blend of cosine similarity and input magnitude, allowing global or per-feature optimization of norm dependence.

In practice

Implement cosine scoring for SAEs on normalized representations.
Prioritize feature interpretability in dictionary learning.
Evaluate cosine encoders for tasks beyond universal advantage.

Topics

Sparse Autoencoders
Feature Learning
Cosine Similarity
Neural Network Interpretability
Representation Learning
Machine Learning

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.