AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QA

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Data Science & Analytics · Depth: Expert, extended

Summary

AgentFinVQA is a multi-agent pipeline designed for auditable and on-premise financial chart question answering (QA) in regulated environments. It decomposes each query into distinct stages: planning, OCR, legend grounding, visual inspection, and verification, meticulously recording every step within a traceable Model Evaluation Packet (MEP). On the FinMME benchmark, AgentFinVQA demonstrated significant accuracy improvements, achieving +7.68 percentage points (pp) over a Gemini-3 Flash zero-shot baseline (71.24% vs. 63.56%). Crucially, it also delivered a +4.84 pp gain using the open-weights Qwen3.6-27B-FP8 model served locally on a single A100-80G, confirming its deployability without proprietary API reliance. The system's verifier provides a useful confidence signal, enabling human-in-the-loop review routing by prioritizing revised answers, which show lower exact accuracy (55.6%) compared to confirmed ones (68.2%). Error analysis identified question misunderstanding, legend confusion, and extraction errors as primary failure modes.

Key takeaway

For MLOps Engineers deploying financial chart QA in regulated settings, AgentFinVQA provides a robust, auditable solution. You can achieve significant accuracy gains, even with open-weights models like Qwen3.6-27B-FP8 served locally, ensuring data residency. Implement its multi-agent pipeline and Model Evaluation Packets to provide full traceability. Utilize the verifier's confidence signals to efficiently route potentially incorrect answers for human review, optimizing operational risk management.

Key insights

AgentFinVQA demonstrates that auditable, on-premise financial chart QA is practical with agentic decomposition.

Principles

Method

A multi-agent pipeline coordinates planning, OCR, legend grounding, visual inspection, and verification, recording each step in a Model Evaluation Packet (MEP).

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, AI Architect, AI Engineer, MLOps Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.