Agentic Trading: When LLM Agents Meet Financial Markets
Summary
The paper "Agentic Trading: When LLM Agents Meet Financial Markets" provides an audit-oriented evidence map of 77 studies on Large Language Model (LLM)-based trading agents, screened through March 9, 2026. It reframes these agents as expert-system decision pipelines. A primary empirical subset of 19 studies, satisfying "Action Output plus Closed-Loop Evaluation," reveals significant protocol incomparability. Only 2/19 studies report extractable time-consistent split protocols, 1/19 reports an explicit transaction-cost model, 1/19 documents universe/survivorship handling, and 11/19 report execution timing or semantics. Furthermore, 15/19 studies are R0, with none reaching R3 reproducibility. The survey introduces an Architecture–Capability–Adaptation (A-C-A) analytical lens and proposes an evidence ledger, reproducibility audit, and reporting checklist to address these bottlenecks.
Key takeaway
For AI Scientists and MLOps Engineers developing LLM-based trading agents, you must prioritize rigorous evaluation protocols. Your systems should explicitly report time-consistent data splits, transaction cost models, and execution semantics (MR-1 to MR-7). Without providing reproducible artifacts and detailed logs, your performance claims will remain preliminary and incomparable, hindering adoption and trust in real-world financial deployments. Focus on auditability to bridge the gap between architectural innovation and verifiable market impact.
Key insights
LLM-based trading agent research lacks comparable evaluation protocols, hindering reliable performance assessment and reproducibility.
Principles
- Agentic trading systems require explicit perception-memory-reasoning-action loops.
- Reproducibility hinges on transparent protocol reporting, not just headline performance.
- Auditability demands grounded, time-stamped tool calls and execution logs.
Method
The paper proposes an audit-oriented evidence mapping approach, categorizing 77 studies into a primary empirical subset (n=19) and background (n=58), then auditing the primary subset for protocol completeness and reproducibility (R0-R3).
In practice
- Implement strict time-consistent data splits and embargo rules.
- Report explicit transaction costs and execution semantics.
- Provide code, data, and immutable logs for reproducibility.
Topics
- Large Language Models
- Algorithmic Trading
- Financial Markets
- Agentic AI
- Reproducibility
- Evaluation Protocols
Best for: Research Scientist, AI Scientist, MLOps Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.