The Hot Path Belongs to GBDTs, Agents Own the Cold Path: A Payment-Fraud Benchmark

2026-06-25 · Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cybersecurity & Data Privacy · Depth: Advanced, long

Summary

This article benchmarks the suitability of LLM agents versus gradient-boosted decision trees (GBDTs) for synchronous payment authorization fraud detection. The benchmark, runnable on a laptop without GPUs or cloud APIs, concludes that classical ML (GBDTs) remains superior for the "hot path" due to critical performance and regulatory issues, while LLM agents are better suited for the "cold path" of asynchronous fraud investigation. Key findings include GBDT p99 latency of 0.15 ms versus LLM simulator's 1,200 ms, far exceeding the 100 ms ISO 8583 budget. Cost analysis shows GBDT at ~\$54/hour for 50,000 TPS, compared to \$16,200 for gpt-4o-mini-class models and \$351,000 for Claude Sonnet 4.6. Crucially, GBDTs are deterministic, returning 1 distinct score from 500 identical inputs, while LLMs yield 498 distinct scores, posing a significant challenge for regulatory compliance (SR 26-2). The recommended hybrid architecture uses GBDTs for real-time scoring and agents for post-flagging tasks like SAR drafting and evidence gathering, incorporating an "agent-as-a-judge" for validation.

Key takeaway

For AI Architects designing fraud detection systems in regulated financial services, prioritize deterministic GBDT models for synchronous hot-path authorization. Your p99 latency must meet the ISO 8583 budget, and reproducibility is non-negotiable for SR 26-2 compliance. Route complex cases to an asynchronous cold path, leveraging LLM agents for evidence gathering and SAR drafting, but always include an agent-as-a-judge for independent validation before human review. This hybrid approach ensures compliance and operational efficiency.

Key insights

Classical ML excels in synchronous fraud detection due to latency, cost, and determinism, while LLM agents fit asynchronous cold path tasks.

Principles

Synchronous authorization demands sub-100ms latency.
Regulated models require reproducible, deterministic outputs.
Batch=1 inference negates GPU LLM cost efficiencies.

Method

The article proposes a hybrid architecture: GBDT for hot-path scoring and LLM agents for cold-path tasks like SAR drafting, evidence gathering, and an "agent-as-a-judge" validation pass.

In practice

Use GBDTs for real-time payment authorization.
Route edge cases to an asynchronous cold path.
Implement an agent-as-a-judge for LLM output validation.

Topics

Payment Fraud Detection
Gradient-Boosted Decision Trees
LLM Agents
Synchronous Authorization
Model Risk Management (SR 26-2)
Asynchronous Cold Path
Agent-as-a-Judge

Code references

sandeepmb/fraud-agents-benchmark

Best for: CTO, VP of Engineering/Data, MLOps Engineer, Machine Learning Engineer, AI Architect, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.