IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text
Summary
IndiaFinBench is the first publicly available evaluation benchmark designed to assess large language model (LLM) performance on Indian financial regulatory text. It addresses a gap in existing financial NLP benchmarks, which primarily use Western financial corpora. The benchmark comprises 406 expert-annotated question-answer pairs from 192 documents sourced from the Securities and Exchange Board of India (SEBI) and the Reserve Bank of India (RBI). These pairs cover four task types: regulatory interpretation (174 items), numerical reasoning (92 items), contradiction detection (62 items), and temporal reasoning (78 items). Evaluation of twelve models under zero-shot conditions showed accuracy ranging from 70.4% (Gemma 4 E4B) to 89.7% (Gemini 2.5 Flash), with all models outperforming a 60.0% non-specialist human baseline. Numerical reasoning proved to be the most discriminative task.
Key takeaway
For AI Engineers developing or deploying LLMs for financial services in India, IndiaFinBench provides a crucial tool for performance assessment. You should utilize this benchmark to validate your models' accuracy and reliability on Indian regulatory texts, especially for tasks involving numerical and temporal reasoning. This will help ensure compliance and improve decision-making in a region previously underserved by existing benchmarks.
Key insights
IndiaFinBench is a new benchmark for evaluating LLMs on Indian financial regulatory text, addressing a critical regional gap.
Principles
- Non-Western regulatory texts require specialized LLM benchmarks.
- Numerical reasoning is a highly discriminative LLM task.
- Expert annotation is crucial for high-quality financial benchmarks.
Method
IndiaFinBench was created by expert-annotating 406 Q&A pairs from SEBI and RBI documents, covering regulatory interpretation, numerical reasoning, contradiction detection, and temporal reasoning tasks.
In practice
- Use IndiaFinBench to evaluate LLMs for Indian financial applications.
- Focus on numerical reasoning for LLM differentiation.
- Consider regional regulatory nuances in LLM development.
Topics
- IndiaFinBench
- Indian Financial Regulation
- LLM Evaluation
- SEBI RBI Documents
- Numerical Reasoning
Code references
Best for: AI Engineer, Machine Learning Engineer, AI Scientist, Research Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.