Evaluating Small Language Models for English-to-Portuguese Translation: Impact of Model Scale and Quantization

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, medium

Summary

A study benchmarked dozens of Small Language Models (SLMs) ranging from 135M to 20B parameters for English-to-Portuguese translation, evaluating their performance across various architectures and quantization schemes (FP16, Q8_0, Q4_K_M). Researchers used the FLORES-101 (Portuguese subset, 1,012 sentences) and OPUS-100 (~10k sentences) datasets, measuring translation quality with BLEU, chrF, and BERTScore. Statistical analysis, including Friedman tests and Wilcoxon signed-rank post-hoc comparisons, revealed that 8-bit quantization (Q8_0) largely preserves semantic quality with minimal loss. While 4-bit quantization (Q4_K_M) showed statistically significant degradation in about half of configurations, its effect sizes were negligible to small, primarily impacting lower-capacity models. The research also found a weak correlation between model scale and translation quality, with medium-sized models sometimes outperforming larger ones.

Key takeaway

For AI Engineers designing English-to-Portuguese translation pipelines, you should prioritize 8-bit quantization (Q8_0) to achieve significant computational and deployment cost savings without substantial semantic quality degradation. Do not assume larger models inherently offer better translation quality; instead, evaluate medium-sized SLMs, as they can often match or exceed the performance of their larger counterparts depending on their specific architecture and pretraining.

Key insights

8-bit quantization maintains translation quality in SLMs, while model scale weakly correlates with performance.

Principles

Method

SLMs (135M-20B params) were benchmarked for English-to-Portuguese translation using FP16, Q8_0, and Q4_K_M quantization on FLORES-101 and OPUS-100 datasets, evaluating with BLEU, chrF, and BERTScore.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.