Modular Monolingual Adaptation using Pretrained Language Models

2026-06-08 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

This work introduces a modular monolingual adaptation strategy for pretrained language models (PLMs) to improve performance in low-resource languages like Scottish Gaelic, Irish, and Quechua (8.5k training instances). Instead of full model finetuning, the proposed method replaces the tokenizer with a language-specific one, initializes new embeddings (using model-based, FastText, or random strategies), freezes these embedding layers, and trains only the non-embedding parameters using a masked language modeling objective. Experiments across BERT, mBERT, and mmBERT models demonstrate that this non-embedding training consistently outperforms or matches full finetuning on mask-filling tasks, while requiring significantly fewer trainable parameters (reducing them by approximately 25% for mBERT) and lowering training costs. For downstream NER and POS tagging tasks, the modular approach performs comparably to full finetuning, validating its effectiveness and efficiency. The custom tokenizer is crucial for performance gains, while embedding initialization choice has a minor impact.

Key takeaway

For Machine Learning Engineers adapting pretrained language models to low-resource languages, you should adopt a modular approach by using a custom tokenizer and freezing embedding layers. This strategy, training only non-embedding parameters, consistently outperforms full finetuning and LoRA on NLU tasks. It also significantly reduces trainable parameters and VRAM usage. Prioritize custom tokenization for substantial performance gains, as embedding initialization choice has a minor impact. This method offers a more efficient and effective path for low-resource language model development.

Key insights

Freezing embeddings during monolingual adaptation of PLMs for low-resource languages prevents overfitting and improves performance.

Principles

Full model tuning is often unnecessary for low-resource adaptation.
Custom language-specific tokenizers significantly boost performance.
Freezing embeddings acts as an effective regularization strategy.

Method

Build a custom WordPiece tokenizer (30k vocab), initialize embeddings (model, FastText, or random), freeze input/output embeddings, then train non-embedding parameters with MLM objective.

In practice

Implement custom tokenizers for low-resource language LMs.
Freeze embedding layers to prevent overfitting in low-resource settings.
Prioritize non-embedding parameter tuning over full finetuning.

Topics

Low-Resource Languages
Pretrained Language Models
Modular Adaptation
Tokenizer Customization
Embedding Freezing
Natural Language Understanding
Parameter Efficiency

Code references

knalin55/MMA-PLM

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.