One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

A study investigated whether Sparse Autoencoder (SAE) features in Large Language Models (LLMs) represent abstract meaning or are tied to text orthography, using Serbian digraphia as a controlled testbed. Serbian is written interchangeably in Latin and Cyrillic scripts with a near-perfect character mapping, but LLMs tokenize these scripts completely differently, sharing no tokens. Analyzing SAE feature activations across the Gemma model family (270M to 27B parameters) with Gemma Scope 2 SAEs, researchers found that identical sentences in different Serbian scripts activated highly overlapping features, with an average Jaccard similarity of ~0.58, significantly exceeding random baselines (~0.28). This cross-script similarity was greater than within-script paraphrase similarity, suggesting SAE features prioritize meaning over orthographic form. The script invariance strengthened with increasing model scale, indicating that larger models develop more robust script-independent representations.

Key takeaway

For research scientists investigating LLM interpretability, this work demonstrates that SAE features can capture semantic content abstractly, beyond surface-level tokenization. You should consider using controlled linguistic testbeds like Serbian digraphia to probe the abstractness of learned representations, especially when evaluating model robustness to orthographic variations. This approach can help confirm that your models are learning genuine semantic understanding rather than script-specific patterns, particularly as model scale increases.

Key insights

SAE features capture abstract semantic meaning independent of surface-level orthography and tokenization.

Principles

Method

The study used Serbian digraphia as a controlled evaluation paradigm, comparing SAE feature activations for identical sentences across Latin and Cyrillic scripts, which have deterministic mapping but disjoint tokenization.

In practice

Topics

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.