Modular Monolingual Adaptation using Pretrained Language Models

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A new modular adaptation method for Pretrained Language Models (PLMs) is proposed to improve performance in low-resource languages. This approach challenges the common practice of finetuning the entire model, which is typically favored over training from scratch and often combined with language-specific tokenizers. Instead, the proposed method involves replacing tokens, freezing their corresponding embeddings, and then tuning only the remaining parts of the model. Experiments were conducted using Scottish Gaelic, Irish, and Quechua, with Quechua representing a very low-resource language with only 8.5k training instances. Evaluation across natural language understanding (NLU) tasks, including mask filling, Named Entity Recognition (NER), and Part-of-Speech (POS) tagging, demonstrated that this modular strategy enhances adaptation performance. The work also includes a comprehensive analysis of various training strategies, the selection of pretrained embeddings, and different model architectures.

Key takeaway

For Machine Learning Engineers adapting PLMs to low-resource languages, consider adopting a modular finetuning strategy. Instead of full model finetuning, freezing token embeddings and tuning only the remaining model components can yield improved NLU performance, especially for extremely low-resource settings like Quechua (8.5k instances). This approach could significantly reduce computational overhead and training time while enhancing model efficacy, allowing you to deploy more specialized language models efficiently.

Key insights

Modular PLM adaptation, freezing embeddings while tuning the rest, improves low-resource language performance.

Principles

Method

Replace tokens, freeze their corresponding embeddings, then finetune the remaining components of the pretrained language model on the target low-resource language dataset.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.