The Hot Path Belongs to GBDTs, Agents Own the Cold Path: A Payment-Fraud Benchmark
Summary
This article benchmarks the suitability of LLM agents versus gradient-boosted decision trees (GBDTs) for synchronous payment authorization fraud detection. The benchmark, runnable on a laptop without GPUs or cloud APIs, concludes that classical ML (GBDTs) remains superior for the "hot path" due to critical performance and regulatory issues, while LLM agents are better suited for the "cold path" of asynchronous fraud investigation. Key findings include GBDT p99 latency of 0.15 ms versus LLM simulator's 1,200 ms, far exceeding the 100 ms ISO 8583 budget. Cost analysis shows GBDT at ~\$54/hour for 50,000 TPS, compared to \$16,200 for gpt-4o-mini-class models and \$351,000 for Claude Sonnet 4.6. Crucially, GBDTs are deterministic, returning 1 distinct score from 500 identical inputs, while LLMs yield 498 distinct scores, posing a significant challenge for regulatory compliance (SR 26-2). The recommended hybrid architecture uses GBDTs for real-time scoring and agents for post-flagging tasks like SAR drafting and evidence gathering, incorporating an "agent-as-a-judge" for validation.
Key takeaway
For AI Architects designing fraud detection systems in regulated financial services, prioritize deterministic GBDT models for synchronous hot-path authorization. Your p99 latency must meet the ISO 8583 budget, and reproducibility is non-negotiable for SR 26-2 compliance. Route complex cases to an asynchronous cold path, leveraging LLM agents for evidence gathering and SAR drafting, but always include an agent-as-a-judge for independent validation before human review. This hybrid approach ensures compliance and operational efficiency.
Key insights
Classical ML excels in synchronous fraud detection due to latency, cost, and determinism, while LLM agents fit asynchronous cold path tasks.
Principles
- Synchronous authorization demands sub-100ms latency.
- Regulated models require reproducible, deterministic outputs.
- Batch=1 inference negates GPU LLM cost efficiencies.
Method
The article proposes a hybrid architecture: GBDT for hot-path scoring and LLM agents for cold-path tasks like SAR drafting, evidence gathering, and an "agent-as-a-judge" validation pass.
In practice
- Use GBDTs for real-time payment authorization.
- Route edge cases to an asynchronous cold path.
- Implement an agent-as-a-judge for LLM output validation.
Topics
- Payment Fraud Detection
- Gradient-Boosted Decision Trees
- LLM Agents
- Synchronous Authorization
- Model Risk Management (SR 26-2)
- Asynchronous Cold Path
- Agent-as-a-Judge
Code references
Best for: CTO, VP of Engineering/Data, MLOps Engineer, Machine Learning Engineer, AI Architect, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.