The Geometry of Updates: Fisher Alignment at Vocabulary Scale

2025-12-12 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

FisherSketch is a new method enabling training-free source selection for LLM families with shared vocabularies, particularly in "activation-dark" scientific string domains like SMILES or genomics. It addresses the limitation of representation-similarity metrics, which are uninformative without considering label-conditioned error geometry. FisherSketch estimates head Fisher alignment, a measure of update similarity, by rewriting it as a cosine between kernel mean embeddings in a joint activation-error space. This allows for a single streaming pass estimation, compressing each task from approximately 61 GiB to a 16 KB float32 signature, with a 192 KB per-task streaming state. On Llama-3.1-8B, it achieved 45.7% top-1 source selection on 100 domains and 66.7% top-1 in verbalizer shift experiments where activation-only methods collapsed.

Key takeaway

For AI Scientists and ML Engineers evaluating LLM transferability, traditional representation similarity metrics are insufficient, especially in "activation-dark" scenarios or verbalizer shifts. You should consider integrating FisherSketch to predict parameter-update compatibility, as it captures crucial label-conditioned error geometry. This enables more reliable training-free source selection, reducing negative transfer and optimizing compute resources before full fine-tuning. Audit automated selection mechanisms for potential bias amplification.

Key insights

Representation-only metrics fail to predict LLM update compatibility; FisherSketch measures joint activation-error geometry at vocabulary scale.

Principles

Representation-only metrics cannot determine head Fisher alignment without error geometry assumptions.
Head Fisher alignment is a product-kernel cosine of joint activation-error mean embeddings.
The Kronecker independence assumption (δ=0) is violated across tested architectures.

Method

FisherSketch estimates product-kernel cosine directly in one streaming pass using factored Random Maclaurin features, producing compact task signatures (16 KB) for vocabulary-scale LLMs.

In practice

Use FisherSketch for training-free source selection in shared-vocabulary LLMs.
Apply task signatures as diagnostic probes for activation, error, and coupling geometry.
Enable nearest-neighbor task retrieval and open-set addition without retraining.

Topics

FisherSketch
LLM Transfer Learning
Source Selection
Neural Network Geometry
Fisher Information
Representation Similarity
Vocabulary Scale

Code references

google/svcca

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.