IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, FinTech & Digital Financial Services, Regulatory Affairs & Government Relations · Depth: Expert, quick

Summary

IndiaFinBench is the first publicly available evaluation benchmark designed to assess large language model (LLM) performance on Indian financial regulatory text. It addresses a gap in existing financial NLP benchmarks, which primarily use Western financial corpora. The benchmark comprises 406 expert-annotated question-answer pairs from 192 documents sourced from the Securities and Exchange Board of India (SEBI) and the Reserve Bank of India (RBI). These pairs cover four task types: regulatory interpretation (174 items), numerical reasoning (92 items), contradiction detection (62 items), and temporal reasoning (78 items). Evaluation of twelve models under zero-shot conditions showed accuracy ranging from 70.4% (Gemma 4 E4B) to 89.7% (Gemini 2.5 Flash), with all models outperforming a 60.0% non-specialist human baseline. Numerical reasoning proved to be the most discriminative task.

Key takeaway

For AI Engineers developing or deploying LLMs for financial services in India, IndiaFinBench provides a crucial tool for performance assessment. You should utilize this benchmark to validate your models' accuracy and reliability on Indian regulatory texts, especially for tasks involving numerical and temporal reasoning. This will help ensure compliance and improve decision-making in a region previously underserved by existing benchmarks.

Key insights

IndiaFinBench is a new benchmark for evaluating LLMs on Indian financial regulatory text, addressing a critical regional gap.

Principles

Method

IndiaFinBench was created by expert-annotating 406 Q&A pairs from SEBI and RBI documents, covering regulatory interpretation, numerical reasoning, contradiction detection, and temporal reasoning tasks.

In practice

Topics

Code references

Best for: AI Engineer, Machine Learning Engineer, AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.