Indi-RomCoM: Code-Mixed Benchmark for Evaluating LLMs on Romanized Indic-English Instructions

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

The Indi-RomCoM benchmark addresses the gap in evaluating Large Language Models (LLMs) on Romanized Code Mixing (RCM), a prevalent communication style blending local languages with English in Roman script. This new benchmark facilitates systematic evaluation across seven instruction-following tasks, four widely spoken Indic languages, and three controlled code-mixing intensity levels. Extensive evaluation of proprietary, open-weight, and Indic-focused LLMs under zero- and few-shot settings reveals consistent underperformance on RCM instructions. Performance degrades notably as code-mixing density increases. Interestingly, reasoning tasks exhibit less degradation than detection tasks like Toxicity, primarily because generated explanations provide necessary context.

Key takeaway

For NLP engineers developing or deploying multilingual Large Language Models, you should prioritize robust handling of Romanized Code Mixing (RCM). Current LLMs consistently underperform on RCM instructions, with performance worsening as code-mixing density increases. Focus your development efforts on improving RCM comprehension, particularly for detection tasks, and consider how generated explanations might mitigate degradation in reasoning tasks.

Key insights

LLMs consistently underperform on Romanized Code-Mixed instructions, with performance degrading as code-mixing density increases.

Principles

Method

The Indi-RomCoM benchmark evaluates LLMs on seven instruction-following tasks across four Indic languages and three code-mixing intensity levels.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.