How Associa transforms document classification with the GenAI IDP Accelerator and Amazon Bedrock
Summary
Associa, North America's largest community management company, collaborated with the AWS Generative AI Innovation Center to develop a generative AI-powered document classification system. This system addresses the challenge of manually categorizing approximately 48 million documents across 26 TB of data, a process that was time-consuming and error-prone for its 15,000 employees. The solution, built using the Generative AI Intelligent Document Processing (GenAI IDP) Accelerator and Amazon Bedrock, automatically classifies incoming documents with high accuracy, aiming for operational efficiencies and cost savings. The development involved optimizing prompt input (full PDF vs. first page only), prompt design (multimodal with OCR vs. image only), and model choice (Amazon Nova Lite, Pro, Premier, and Anthropic's Claude Sonnet 4). The final system, utilizing Amazon Nova Pro, achieved 95% accuracy at an average cost of 0.55 cents per document by processing only the first page with combined OCR and image data.
Key takeaway
For AI Engineers and ML teams developing intelligent document processing solutions, prioritizing prompt engineering and model selection is crucial. Your evaluation should focus on optimizing input length and multimodal data integration, as demonstrated by the 95% accuracy and 0.55 cents/document cost achieved with first-page-only OCR+image input using Amazon Nova Pro. This approach can significantly reduce operational costs and improve classification accuracy, especially for ambiguous document types, directly impacting project ROI and system reliability.
Key insights
Optimizing prompt input and design significantly enhances generative AI document classification accuracy and cost-efficiency.
Principles
- First-page-only processing improves accuracy and reduces cost.
- OCR data combined with images enhances classification accuracy.
- Model choice balances accuracy, especially for "Unknown" types, and cost.
Method
The GenAI IDP Accelerator uses OCR (Amazon Textract) and generative AI (Amazon Bedrock) to convert unstructured documents into structured data, supporting scalable, modular document processing workflows.
In practice
- Evaluate prompt input (full vs. first page) for efficiency.
- Test multimodal prompts with OCR and image data.
- Benchmark multiple LLMs for accuracy and cost trade-offs.
Topics
- Generative AI
- Document Classification
- Amazon Bedrock
- Intelligent Document Processing
- Amazon Textract
Code references
Best for: Machine Learning Engineer, AI Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.