CARTE: A Benchmark for Mapping Language Model Knowledge Across France

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, medium

Summary

CARTE (Culturally Anchored Regional-Territorial Evaluation) is a new multiple-choice benchmark designed to assess large language models' (LLMs) ability to perform fine-grained reasoning on geographically specific and regionally differentiated knowledge within France. Unlike existing benchmarks that primarily focus on national-level cultural understanding, CARTE addresses the oversight of intra-country variation. It comprises 2,431 questions spanning France's 13 metropolitan regions and covers 14 thematic domains, including culture, language, demographics, economy, environment, and mobility. A subset, CARTE-LV, specifically targets Linguistic Variation across French regions. Evaluations of 27 LLMs, ranging from 1B to 12B parameters in few-shot settings, revealed significant performance disparities across different regions and model scales, suggesting systematic gaps in pretraining data coverage and limited robustness to intra-national variations.

Key takeaway

For AI Scientists and Machine Learning Engineers deploying LLMs for applications requiring nuanced geographical understanding, you should critically evaluate your models' performance on intra-country regional knowledge. Your current LLMs likely possess systematic gaps in pretraining coverage, leading to inconsistent accuracy across different regions. Consider fine-tuning with geographically diverse datasets or integrating knowledge graphs to enhance regional robustness, especially for applications targeting specific local populations or cultural contexts.

Key insights

LLMs exhibit significant knowledge gaps in fine-grained, geographically specific regional understanding, particularly within countries.

Principles

Method

CARTE constructs a multiple-choice benchmark with 2,431 questions across 13 French regions and 14 themes, including a linguistic variation subset, to evaluate LLMs' regional knowledge.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.