AMALIA: A Fully Open Large Language Model for European Portuguese
Summary
AMALIA is a new, fully open large language model (LLM) specifically designed for European Portuguese (pt-PT), addressing its underrepresentation in existing LLMs and evaluation benchmarks. Developed by Afonso Simplício et al. and presented at PROPOR 2026, AMALIA prioritizes pt-PT by incorporating more high-quality pt-PT data during its mid- and post-training phases. To facilitate accurate evaluation, the researchers also released a suite of pt-PT benchmarks, comprising translated standard tasks and four novel datasets. These new datasets specifically target pt-PT generation, linguistic competence, and the distinction between pt-PT and pt-BR biases. Experimental results indicate that AMALIA performs comparably to strong baselines on translated benchmarks while demonstrating significant improvements on evaluations tailored to pt-PT.
Key takeaway
For research scientists developing LLMs for specific language variants, you should prioritize creating and utilizing high-quality, variant-specific training data. Additionally, invest in developing native evaluation benchmarks, as machine-translated benchmarks may fail to capture crucial linguistic and cultural nuances, potentially leading to inaccurate performance assessments for your target language.
Key insights
Targeted training and native benchmarking are crucial for underrepresented language variants like European Portuguese.
Principles
- Prioritize high-quality data for target language variants.
- Develop native benchmarks for accurate evaluation.
Method
AMALIA was developed by integrating more high-quality European Portuguese data during mid- and post-training stages, complemented by a new suite of pt-PT-specific evaluation benchmarks.
In practice
- Use pt-PT specific datasets for fine-tuning.
- Employ new pt-PT benchmarks for evaluation.
Topics
- AMALIA LLM
- European Portuguese
- Language Model Training
- Native Benchmarking
- Linguistic Nuances
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.