CARTE: A Benchmark for Mapping Language Model Knowledge Across France
Summary
CARTE (Culturally Anchored Regional-Territorial Evaluation) is a new multiple-choice benchmark designed to assess large language models' (LLMs) ability to perform fine-grained reasoning on geographically specific and regionally differentiated knowledge within France. Unlike existing benchmarks that primarily focus on national-level cultural understanding, CARTE addresses the oversight of intra-country variation. It comprises 2,431 questions spanning France's 13 metropolitan regions and covers 14 thematic domains, including culture, language, demographics, economy, environment, and mobility. A subset, CARTE-LV, specifically targets Linguistic Variation across French regions. Evaluations of 27 LLMs, ranging from 1B to 12B parameters in few-shot settings, revealed significant performance disparities across different regions and model scales, suggesting systematic gaps in pretraining data coverage and limited robustness to intra-national variations.
Key takeaway
For AI Scientists and Machine Learning Engineers deploying LLMs for applications requiring nuanced geographical understanding, you should critically evaluate your models' performance on intra-country regional knowledge. Your current LLMs likely possess systematic gaps in pretraining coverage, leading to inconsistent accuracy across different regions. Consider fine-tuning with geographically diverse datasets or integrating knowledge graphs to enhance regional robustness, especially for applications targeting specific local populations or cultural contexts.
Key insights
LLMs exhibit significant knowledge gaps in fine-grained, geographically specific regional understanding, particularly within countries.
Principles
- Intra-country knowledge varies significantly.
- Pretraining data has regional blind spots.
- Model scale doesn't guarantee regional robustness.
Method
CARTE constructs a multiple-choice benchmark with 2,431 questions across 13 French regions and 14 themes, including a linguistic variation subset, to evaluate LLMs' regional knowledge.
In practice
- Test LLMs for regional knowledge gaps.
- Supplement pretraining with regional data.
- Use CARTE-LV for linguistic variation.
Topics
- LLM Evaluation
- Regional Knowledge
- Geographic Benchmarking
- Cultural Understanding
- Linguistic Variation
- Pretraining Data Gaps
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.