Biatron: A Parameter-Efficient Small Language Model for Brazilian Portuguese with Integrated Mathematical Reasoning
Summary
The Biatron series introduces its first model, a 345-million-parameter Small Language Model (SLM) specifically optimized for Brazilian Portuguese, focusing on parameter efficiency and mathematical reasoning. Developed through strategic data curation rather than extensive parameter scaling, Biatron was trained on 300 billion tokens using the Megatron-LM framework, achieving 32% Model FLOP Utilization. It utilizes a 60-30-10 data mixture, incorporating high-quality Portuguese text from GigaVerbo, chain-of-thought reasoning examples, and mathematical datasets. Biatron achieved an aggregate performance score of 0.245 on Portuguese benchmarks, nearing Tucano-630M's performance with 45% fewer parameters. Notably, it attained 7.5% Pass@1 accuracy on mathematical reasoning tasks, more than doubling Tucano-2.4B's 3.5% accuracy despite being significantly smaller. The model weights, training logs, and checkpoints are publicly available.
Key takeaway
For AI Engineers developing language models for resource-constrained languages like Brazilian Portuguese, Biatron demonstrates that carefully curated data mixtures can yield superior performance in specialized tasks, such as mathematical reasoning, with significantly fewer parameters. You should consider optimizing your data strategy to include high-quality text, chain-of-thought examples, and domain-specific datasets before resorting to larger models, potentially reducing computational costs and improving task-specific accuracy.
Key insights
Strategic data curation can outperform brute-force parameter scaling for SLMs, especially in specialized domains.
Principles
- Data quality and mixture are critical for SLM performance.
- Parameter efficiency is achievable through targeted data strategies.
Method
Biatron was trained using a 60-30-10 data mixture of GigaVerbo text, chain-of-thought examples, and mathematical datasets on 300 billion tokens with Megatron-LM.
In practice
- Prioritize data quality over parameter count for SLMs.
- Integrate diverse data types for specialized capabilities.
Topics
- Biatron
- Small Language Models
- Brazilian Portuguese
- Mathematical Reasoning
- Parameter Efficiency
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.