The Geometry of Updates: Fisher Alignment at Vocabulary Scale

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

FisherSketch is a new method enabling training-free source selection for LLM families with shared vocabularies, particularly in "activation-dark" scientific string domains like SMILES or genomics. It addresses the limitation of representation-similarity metrics, which are uninformative without considering label-conditioned error geometry. FisherSketch estimates head Fisher alignment, a measure of update similarity, by rewriting it as a cosine between kernel mean embeddings in a joint activation-error space. This allows for a single streaming pass estimation, compressing each task from approximately 61 GiB to a 16 KB float32 signature, with a 192 KB per-task streaming state. On Llama-3.1-8B, it achieved 45.7% top-1 source selection on 100 domains and 66.7% top-1 in verbalizer shift experiments where activation-only methods collapsed.

Key takeaway

For AI Scientists and ML Engineers evaluating LLM transferability, traditional representation similarity metrics are insufficient, especially in "activation-dark" scenarios or verbalizer shifts. You should consider integrating FisherSketch to predict parameter-update compatibility, as it captures crucial label-conditioned error geometry. This enables more reliable training-free source selection, reducing negative transfer and optimizing compute resources before full fine-tuning. Audit automated selection mechanisms for potential bias amplification.

Key insights

Representation-only metrics fail to predict LLM update compatibility; FisherSketch measures joint activation-error geometry at vocabulary scale.

Principles

Method

FisherSketch estimates product-kernel cosine directly in one streaming pass using factored Random Maclaurin features, producing compact task signatures (16 KB) for vocabulary-scale LLMs.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.