The Geometry of Updates: Fisher Alignment at Vocabulary Scale
Summary
FisherSketch is a new method enabling training-free source selection for LLM families with shared vocabularies, particularly in "activation-dark" scientific string domains like SMILES or genomics. It addresses the limitation of representation-similarity metrics, which are uninformative without considering label-conditioned error geometry. FisherSketch estimates head Fisher alignment, a measure of update similarity, by rewriting it as a cosine between kernel mean embeddings in a joint activation-error space. This allows for a single streaming pass estimation, compressing each task from approximately 61 GiB to a 16 KB float32 signature, with a 192 KB per-task streaming state. On Llama-3.1-8B, it achieved 45.7% top-1 source selection on 100 domains and 66.7% top-1 in verbalizer shift experiments where activation-only methods collapsed.
Key takeaway
For AI Scientists and ML Engineers evaluating LLM transferability, traditional representation similarity metrics are insufficient, especially in "activation-dark" scenarios or verbalizer shifts. You should consider integrating FisherSketch to predict parameter-update compatibility, as it captures crucial label-conditioned error geometry. This enables more reliable training-free source selection, reducing negative transfer and optimizing compute resources before full fine-tuning. Audit automated selection mechanisms for potential bias amplification.
Key insights
Representation-only metrics fail to predict LLM update compatibility; FisherSketch measures joint activation-error geometry at vocabulary scale.
Principles
- Representation-only metrics cannot determine head Fisher alignment without error geometry assumptions.
- Head Fisher alignment is a product-kernel cosine of joint activation-error mean embeddings.
- The Kronecker independence assumption (δ=0) is violated across tested architectures.
Method
FisherSketch estimates product-kernel cosine directly in one streaming pass using factored Random Maclaurin features, producing compact task signatures (16 KB) for vocabulary-scale LLMs.
In practice
- Use FisherSketch for training-free source selection in shared-vocabulary LLMs.
- Apply task signatures as diagnostic probes for activation, error, and coupling geometry.
- Enable nearest-neighbor task retrieval and open-set addition without retraining.
Topics
- FisherSketch
- LLM Transfer Learning
- Source Selection
- Neural Network Geometry
- Fisher Information
- Representation Similarity
- Vocabulary Scale
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.