The Geometry of Updates: Fisher Alignment at Vocabulary Scale

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

A new approach, Fisher Alignment, addresses training-free source selection for Large Language Model (LLM) families with shared vocabularies, particularly in scientific string domains like SMILES or genomic sequences. This method overcomes limitations of representation-similarity metrics, which are uninformative without label-conditioned error geometry assumptions, and classical update-geometry metrics, which are computationally prohibitive at vocabulary scale. The core identity reveals that head Fisher alignment is a cosine between kernel mean embeddings in the joint activation-error space, exposing activation, error, and coupling factors. To make this practical, FisherSketch estimates this cosine directly in a single streaming pass, requiring only a 16 KB task signature (m=4096) and a 192 KB per-task streaming state for K=128,256 head Fisher alignment. Beyond source selection, FisherSketch also functions as a diagnostic instrument, validated through Llama-3.1-8B verbalizer-shift experiments, to study whether LLM task similarity is driven by activations, errors, or their coupling, even when activation similarity fails to distinguish tasks.

Key takeaway

For Machine Learning Engineers optimizing LLM performance in scientific string domains, traditional representation-similarity metrics are often misleading for transfer learning. You should consider adopting FisherSketch for training-free source selection, as it provides a computationally efficient way to identify optimal pre-training data. This method, requiring minimal memory (16 KB signature), also offers a diagnostic tool to understand whether task similarity is driven by activations, errors, or their coupling, guiding more effective model fine-tuning and adaptation strategies.

Key insights

Fisher Alignment and FisherSketch enable efficient, training-free LLM source selection and task similarity diagnostics at vocabulary scale.

Principles

Method

FisherSketch estimates head Fisher alignment's cosine directly in a single streaming pass, using a 16 KB task signature and 192 KB streaming state, making it practical for large vocabularies.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.