Training Azerbaijani language models on Amazon SageMaker AI

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Advanced, long

Summary

Azercell Telecom LLC, in collaboration with the AWS Generative AI Innovation Center, developed a production-ready framework on Amazon SageMaker AI for training Azerbaijani large language models. This six-week project addressed the challenge of adapting foundation models to a morphologically rich, low-resource language. The solution achieved a 23% higher training throughput and 58% lower peak GPU memory usage on an ml.p5.48xlarge instance. Key components include a custom monolingual tokenizer, which doubled encoding efficiency from 3.22 to 1.59 tokens per word, effectively doubling the model's context window capacity for Azerbaijani text. The framework also utilized continued pre-training of Llama 3.2 1B with PyTorch FSDP and Liger Kernel optimizations, reducing per-GPU memory from 9.23 GB to 1.17 GB. Supervised fine-tuning with LoRA then transformed the model into a coherent conversational assistant.

Key takeaway

For NLP Engineers developing LLMs for low-resource or morphologically rich languages, you should prioritize custom tokenizer development to double context window capacity. Implement PyTorch FSDP and Liger Kernels on Amazon SageMaker AI to achieve significant GPU memory savings and up to 23% higher training throughput. This approach enables efficient adaptation of foundation models and scalable deployment for conversational AI applications.

Key insights

Optimizing LLM training for low-resource, morphologically rich languages requires custom tokenization and GPU memory optimizations.

Principles

Method

Develop custom tokenizer (BBPE), then perform two-phase continued pre-training (embedding adaptation, full training) with FSDP and Liger Kernels, followed by LoRA-based supervised fine-tuning.

In practice

Topics

Code references

Best for: Machine Learning Engineer, NLP Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.