Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study
Summary
A comparative study benchmarked seven foundation models from five providers on 273 Ukrainian court decisions from the EDRSR, focusing on tokenizer fertility and zero-shot performance across three tasks. The research found that tokenizer fertility varied by 1.6x, with Qwen3 models consuming 60% more tokens than Llama-family models for the same input, directly impacting API costs. NVIDIA Nemotron Super 3 (120B) achieved the highest composite score of 83.1, surpassing Mistral Large 3 (675B total, 41B active), despite Mistral having significantly more parameters and costing three times as much via API. Additionally, few-shot prompting consistently degraded performance by up to 26 percentage points, a finding confirmed by ablations for Ukrainian-language demonstrations.
Key takeaway
For AI/ML teams evaluating foundation models for Ukrainian legal text processing, your model selection process should critically include tokenizer efficiency analysis to manage API costs. You should also default to zero-shot prompting, as few-shot examples can significantly degrade performance for morphologically rich languages like Ukrainian, contrary to common intuition. This approach will optimize both cost and accuracy.
Key insights
Tokenizer efficiency and zero-shot performance vary significantly across foundation models on Ukrainian legal text.
Principles
- Tokenizer fertility impacts API costs.
- Zero-shot often outperforms few-shot for morphologically rich languages.
Method
Benchmarking models on Ukrainian legal text, measuring tokenizer fertility and zero-shot performance, followed by stratified and prompt-sensitivity ablations.
In practice
- Prioritize tokenizer analysis before model selection.
- Use zero-shot prompting as a default for Ukrainian language tasks.
Topics
- Foundation Models
- Ukrainian Legal Text
- Tokenizer Fertility
- Zero-Shot Performance
- Few-Shot Prompting
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.