How to build Multi Agents for FINANCE: Outperforming Anthropic
Summary
IBM Research's February 26, 2026, evaluation of AI agents indicates a trade-off between cost and performance, with high-performing models like Claude Opus 4.5 costing $8-$45 for 73% accuracy, while GPT 5.2 offers lower costs ($0.25-$0.50) but reduced performance (39%). The article highlights that existing financial AI agents, such as those from Anthropic, primarily automate predefined workflows and data gathering rather than exhibiting true intelligence. A new study from February 25, 2026, introduces Yuan 4.0, a 36-billion-parameter open-source model that significantly outperforms proprietary models like Claude 4.5 by at least 9 percentage points in financial tasks. This outperformance is attributed to a novel training methodology using a "Financial Intelligence and Reasoning Evaluation" (FIRE) benchmark, which includes 14,000 theoretical questions and 3,000 real-world financial scenarios, and a dual reward system for reinforcement learning.
Key takeaway
For AI scientists and NLP engineers developing financial applications, relying solely on large proprietary models may not yield optimal performance or cost-efficiency. You should explore fine-tuning smaller, open-source models like Yuan 4.0 with domain-specific data and advanced training methodologies, such as the dual reward system and reverse chain-of-thought synthesis, to achieve superior results in complex financial reasoning tasks and potentially run models locally behind your firewall.
Key insights
Specialized, locally runnable LLMs can outperform large proprietary models in domain-specific tasks through targeted training and novel evaluation.
Principles
- Performance in AI agents is directly correlated with cost.
- General agents struggle with real-world, multi-step financial scenarios.
- Process-oriented rewards improve AI reasoning beyond outcome-based metrics.
Method
Yuan 4.0's training involves continual pre-training with self-regularization, DPO-based fine-tuning, and a dual reward system (format and accuracy) to generate logical trajectories from human expert rationales, simulating human reasoning.
In practice
- Consider fine-tuning open-source models for domain-specific tasks.
- Implement dual reward systems for process-oriented RL.
- Utilize human expert rationales for reverse chain-of-thought fine-tuning.
Topics
- Financial AI Agents
- Yuan 4.0 LLM
- FIRE Benchmark
- Reinforcement Learning
- Process Reward Models
Best for: NLP Engineer, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, AI Researcher
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.