Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark
Summary
A new study introduces DemosQA, a novel dataset designed for Greek Question Answering (QA), addressing the bias of Large Language Models (LLMs) towards high-resourced languages. DemosQA is constructed from social media user questions and community-reviewed answers, aiming to better reflect Greek social and cultural nuances. The research also presents a memory-efficient LLM evaluation framework, which is adaptable across various QA datasets and languages. The study conducts an extensive evaluation of 11 different LLMs, encompassing both monolingual and multilingual models, across 6 human-curated Greek QA datasets. This evaluation utilizes 3 distinct prompting strategies to assess model effectiveness in language-specific tasks, particularly for under-resourced languages like Greek, where such comparisons have been less explored.
Key takeaway
For research scientists developing or deploying LLMs for specific linguistic and cultural contexts, you should prioritize evaluating monolingual models against multilingual ones using culturally relevant datasets. This study highlights that relying solely on transfer learning from high-resourced languages can lead to misrepresentation. Consider creating or utilizing language-specific datasets like DemosQA to ensure your models accurately capture local social and cultural aspects, improving relevance and performance.
Key insights
Monolingual LLMs for under-resourced languages require specific evaluation against multilingual counterparts.
Principles
- LLM training data often biases popular languages.
- Transfer learning can misrepresent cultural aspects.
Method
Construct DemosQA from social media questions and community answers. Evaluate 11 LLMs on 6 Greek QA datasets using 3 prompting strategies with a memory-efficient framework.
In practice
- Use DemosQA for Greek QA benchmarking.
- Employ the evaluation framework for diverse languages.
Topics
- Greek Question Answering
- Large Language Models
- DemosQA Dataset
- Multilingual LLMs
- LLM Evaluation
Best for: Research Scientist, AI Researcher, NLP Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.