Indi-RomCoM: Code-Mixed Benchmark for Evaluating LLMs on Romanized Indic-English Instructions

2026-06-29 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

The Indi-RomCoM benchmark addresses the gap in evaluating Large Language Models (LLMs) on Romanized Code Mixing (RCM), a prevalent communication style blending local languages with English in Roman script. This new benchmark facilitates systematic evaluation across seven instruction-following tasks, four widely spoken Indic languages, and three controlled code-mixing intensity levels. Extensive evaluation of proprietary, open-weight, and Indic-focused LLMs under zero- and few-shot settings reveals consistent underperformance on RCM instructions. Performance degrades notably as code-mixing density increases. Interestingly, reasoning tasks exhibit less degradation than detection tasks like Toxicity, primarily because generated explanations provide necessary context.

Key takeaway

For NLP engineers developing or deploying multilingual Large Language Models, you should prioritize robust handling of Romanized Code Mixing (RCM). Current LLMs consistently underperform on RCM instructions, with performance worsening as code-mixing density increases. Focus your development efforts on improving RCM comprehension, particularly for detection tasks, and consider how generated explanations might mitigate degradation in reasoning tasks.

Key insights

LLMs consistently underperform on Romanized Code-Mixed instructions, with performance degrading as code-mixing density increases.

Principles

Romanized Code Mixing is a dominant multilingual communication form.
LLM performance on RCM degrades with increased code-mixing density.
Reasoning tasks are more robust than detection tasks in RCM contexts.

Method

The Indi-RomCoM benchmark evaluates LLMs on seven instruction-following tasks across four Indic languages and three code-mixing intensity levels.

In practice

Systematically evaluate LLMs for Romanized Code-Mixed instruction following.
Develop more inclusive multilingual LLM systems.

Topics

Romanized Code Mixing
LLM Evaluation
Indic Languages
Multilingual NLP
Instruction Following
Benchmark Development

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.