One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations
Summary
A study investigated whether Sparse Autoencoder (SAE) features in Large Language Models (LLMs) represent abstract meaning or are tied to text orthography, using Serbian digraphia as a controlled testbed. Serbian is written interchangeably in Latin and Cyrillic scripts with a near-perfect character mapping, but LLMs tokenize these scripts completely differently, sharing no tokens. Analyzing SAE feature activations across the Gemma model family (270M to 27B parameters) with Gemma Scope 2 SAEs, researchers found that identical sentences in different Serbian scripts activated highly overlapping features, with an average Jaccard similarity of ~0.58, significantly exceeding random baselines (~0.28). This cross-script similarity was greater than within-script paraphrase similarity, suggesting SAE features prioritize meaning over orthographic form. The script invariance strengthened with increasing model scale, indicating that larger models develop more robust script-independent representations.
Key takeaway
For research scientists investigating LLM interpretability, this work demonstrates that SAE features can capture semantic content abstractly, beyond surface-level tokenization. You should consider using controlled linguistic testbeds like Serbian digraphia to probe the abstractness of learned representations, especially when evaluating model robustness to orthographic variations. This approach can help confirm that your models are learning genuine semantic understanding rather than script-specific patterns, particularly as model scale increases.
Key insights
SAE features capture abstract semantic meaning independent of surface-level orthography and tokenization.
Principles
- Meaning drives representational similarity more than orthography.
- Larger models develop more robust script-invariant representations.
Method
The study used Serbian digraphia as a controlled evaluation paradigm, comparing SAE feature activations for identical sentences across Latin and Cyrillic scripts, which have deterministic mapping but disjoint tokenization.
In practice
- Serbian digraphia can evaluate abstractness of learned representations.
- SAE features offer a lens into LLM cross-script interpretability.
Topics
- Sparse Autoencoders
- Script Invariance
- LLM Interpretability
- Gemma Models
- Serbian Digraphia
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.