Size Doesn't Matter: Cosine-Scored Sparse Autoencoders

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Cosine-Scored Sparse Autoencoders (SAEs) address a limitation in standard SAEs where feature activation, based on inner product, scales with both directional alignment and input norm. This is problematic because sublayer normalization in models discards magnitude, causing standard SAEs to detect a quantity the model doesn't use, leading to wasted dictionary slots on "norm detectors." The proposed method replaces the inner product score with a learned blend of cosine similarity and input magnitude, allowing the optimizer to determine the optimal norm usage, either globally or per-feature. Experiments show that the optimizer consistently chooses less than half-magnitude dependence. At matched reconstruction, the cosine encoder learns features that align with human-recognizable concepts more frequently than standard SAEs, efficiently utilizing dictionary slots. The forward-pass score geometry is identified as the primary lever for this improvement, suggesting cosine scoring as a default for dictionary learning on normalized representations, despite its non-universal advantage across all tasks or depths.

Key takeaway

For Machine Learning Engineers developing sparse autoencoders on normalized representations, you should consider adopting cosine scoring as the default. This approach significantly improves feature interpretability and dictionary slot utilization by decoupling feature activation from input magnitude, which is often discarded by sublayer normalization. Implementing this can lead to more meaningful and human-recognizable learned features, even if its advantage isn't universal across all tasks.

Key insights

Cosine-scored sparse autoencoders improve feature learning by decoupling activation from input norm, aligning features with human concepts.

Principles

Method

Replace the standard inner product score in SAEs with a learned blend of cosine similarity and input magnitude, allowing global or per-feature optimization of norm dependence.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.