Scaling Document Ingestion for AI Agents Lessons from the field with StackAI
Summary
Stack AI, an end-to-end AI agent platform, orchestrates and automates agentic AI workflows for enterprises, supporting over 100 integrations with data sources like SharePoint, Snowflake, Confluence, and Google Drive. The platform emphasizes LLM agnosticism, allowing customers to choose providers like Anthropic or OpenAI, and offers a no-code drag-and-drop interface with advanced functionalities like Python nodes and sandboxed browsing. Stack AI processes millions of complex, heterogeneous enterprise documents, including financial statements and compliance reports, which often contain mixed formats, tables, figures, and even handwritten notes. The company prioritizes robust parsing and retrieval pipelines, highlighting the critical impact of even a 2% parsing error rate on large datasets. Stack AI's parsing infrastructure, which includes Llama Parse as a primary workhorse for PDFs, is distributed using Temporal and supports up to 50 concurrent parsing jobs across 10 orchestrated stages, utilizing 14 different parsing presets.
Key takeaway
For MLOps Engineers building AI agent workflows with large, complex enterprise documents, prioritize a resilient and scalable parsing infrastructure. Implement a multi-parser toolkit, like Stack AI's use of Llama Parse, to dynamically match parsing quality to document content and agent task, optimizing both cost and accuracy. Consider agentic file access to efficiently retrieve specific information from documents, rather than parsing entire documents at high quality, to avoid wasting resources and improve overall agent performance.
Key insights
Reliable, scalable, and cost-efficient document parsing is critical for high-fidelity AI agent performance with enterprise data.
Principles
- Garbage in, garbage out applies to parsing.
- Structure carries significant meaning in documents.
- Parsing needs vary by document type and task.
Method
Implement a distributed parsing architecture with a toolkit of multiple parsers, dynamically routing pages to appropriate quality presets based on content complexity to optimize cost and accuracy, and integrate agentic file access for targeted retrieval.
In practice
- Start simple, then expand parsing capabilities.
- Implement caching for frequently re-uploaded documents.
- Collect realistic document samples for benchmarking.
Topics
- AI Agent Platforms
- Document Ingestion
- Parsing and Retrieval
- Llama Parse
- Unstructured Data
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by LlamaIndex.