AgentFinVQA: A Deployable Multi-Agent Pipeline for Auditable Financial Chart QA
Summary
AgentFinVQA is a multi-agent pipeline designed for auditable and on-premise financial chart question answering (QA) in regulated environments. It decomposes each query into distinct stages: planning, OCR, legend grounding, visual inspection, and verification, meticulously recording every step within a traceable Model Evaluation Packet (MEP). On the FinMME benchmark, AgentFinVQA demonstrated significant accuracy improvements, achieving +7.68 percentage points (pp) over a Gemini-3 Flash zero-shot baseline (71.24% vs. 63.56%). Crucially, it also delivered a +4.84 pp gain using the open-weights Qwen3.6-27B-FP8 model served locally on a single A100-80G, confirming its deployability without proprietary API reliance. The system's verifier provides a useful confidence signal, enabling human-in-the-loop review routing by prioritizing revised answers, which show lower exact accuracy (55.6%) compared to confirmed ones (68.2%). Error analysis identified question misunderstanding, legend confusion, and extraction errors as primary failure modes.
Key takeaway
For MLOps Engineers deploying financial chart QA in regulated settings, AgentFinVQA provides a robust, auditable solution. You can achieve significant accuracy gains, even with open-weights models like Qwen3.6-27B-FP8 served locally, ensuring data residency. Implement its multi-agent pipeline and Model Evaluation Packets to provide full traceability. Utilize the verifier's confidence signals to efficiently route potentially incorrect answers for human review, optimizing operational risk management.
Key insights
AgentFinVQA demonstrates that auditable, on-premise financial chart QA is practical with agentic decomposition.
Principles
- Traceable reasoning steps are critical for auditability.
- Agentic decomposition enhances chart QA accuracy.
- Open-weights models enable on-premise data residency.
Method
A multi-agent pipeline coordinates planning, OCR, legend grounding, visual inspection, and verification, recording each step in a Model Evaluation Packet (MEP).
In practice
- Decompose complex QA tasks into specialized agent steps.
- Use verifier confidence signals for human review routing.
- Generate per-sample Model Evaluation Packets for audit.
Topics
- Financial Chart QA
- Multi-Agent Systems
- Auditable AI
- On-premise LLMs
- Model Evaluation Packet
- VLM Deployment
Code references
Best for: CTO, VP of Engineering/Data, AI Architect, AI Engineer, MLOps Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.