IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text

2026-04-21 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, FinTech & Digital Financial Services, Regulatory Affairs & Government Relations · Depth: Expert, quick

Summary

IndiaFinBench is the first publicly available evaluation benchmark designed to assess large language model (LLM) performance on Indian financial regulatory text. It addresses a gap in existing financial NLP benchmarks, which primarily use Western financial corpora. The benchmark comprises 406 expert-annotated question-answer pairs from 192 documents sourced from the Securities and Exchange Board of India (SEBI) and the Reserve Bank of India (RBI). These pairs cover four task types: regulatory interpretation (174 items), numerical reasoning (92 items), contradiction detection (62 items), and temporal reasoning (78 items). Evaluation of twelve models under zero-shot conditions showed accuracy ranging from 70.4% (Gemma 4 E4B) to 89.7% (Gemini 2.5 Flash), with all models outperforming a 60.0% non-specialist human baseline. Numerical reasoning proved to be the most discriminative task.

Key takeaway

For AI Engineers developing or deploying LLMs for financial services in India, IndiaFinBench provides a crucial tool for performance assessment. You should utilize this benchmark to validate your models' accuracy and reliability on Indian regulatory texts, especially for tasks involving numerical and temporal reasoning. This will help ensure compliance and improve decision-making in a region previously underserved by existing benchmarks.

Key insights

IndiaFinBench is a new benchmark for evaluating LLMs on Indian financial regulatory text, addressing a critical regional gap.

Principles

Non-Western regulatory texts require specialized LLM benchmarks.
Numerical reasoning is a highly discriminative LLM task.
Expert annotation is crucial for high-quality financial benchmarks.

Method

IndiaFinBench was created by expert-annotating 406 Q&A pairs from SEBI and RBI documents, covering regulatory interpretation, numerical reasoning, contradiction detection, and temporal reasoning tasks.

In practice

Use IndiaFinBench to evaluate LLMs for Indian financial applications.
Focus on numerical reasoning for LLM differentiation.
Consider regional regulatory nuances in LLM development.

Topics

IndiaFinBench
Indian Financial Regulation
LLM Evaluation
SEBI RBI Documents
Numerical Reasoning

Code references

rajveerpall/IndiaFinBench

Best for: AI Engineer, Machine Learning Engineer, AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.