Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

2026-02-20 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A new study introduces DemosQA, a novel dataset designed for Greek Question Answering (QA), addressing the bias of Large Language Models (LLMs) towards high-resourced languages. DemosQA is constructed from social media user questions and community-reviewed answers, aiming to better reflect Greek social and cultural nuances. The research also presents a memory-efficient LLM evaluation framework, which is adaptable across various QA datasets and languages. The study conducts an extensive evaluation of 11 different LLMs, encompassing both monolingual and multilingual models, across 6 human-curated Greek QA datasets. This evaluation utilizes 3 distinct prompting strategies to assess model effectiveness in language-specific tasks, particularly for under-resourced languages like Greek, where such comparisons have been less explored.

Key takeaway

For research scientists developing or deploying LLMs for specific linguistic and cultural contexts, you should prioritize evaluating monolingual models against multilingual ones using culturally relevant datasets. This study highlights that relying solely on transfer learning from high-resourced languages can lead to misrepresentation. Consider creating or utilizing language-specific datasets like DemosQA to ensure your models accurately capture local social and cultural aspects, improving relevance and performance.

Key insights

Monolingual LLMs for under-resourced languages require specific evaluation against multilingual counterparts.

Principles

LLM training data often biases popular languages.
Transfer learning can misrepresent cultural aspects.

Method

Construct DemosQA from social media questions and community answers. Evaluate 11 LLMs on 6 Greek QA datasets using 3 prompting strategies with a memory-efficient framework.

In practice

Use DemosQA for Greek QA benchmarking.
Employ the evaluation framework for diverse languages.

Topics

Greek Question Answering
Large Language Models
DemosQA Dataset
Multilingual LLMs
LLM Evaluation

Best for: Research Scientist, AI Researcher, NLP Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.