Starting with Production in Mind: A Blueprint for Affordable Enterprise-Grade RAG on VMware Tanzu
Summary
Intel and T-Systems collaborated to deploy Intel® AI for Enterprise RAG (ERAG) on T-Systems' production VMware Tanzu infrastructure, demonstrating an enterprise-grade AI chat assistant can run on cost-effective virtual machines. This solution leverages 4th Generation Intel® Xeon® Scalable processors, eliminating the need for dedicated GPU hardware. The collaboration validated responsive, interactive performance for dozens of concurrent users and accurate answers through advanced retrieval techniques like "similarity search with siblings." It also confirmed the system meets critical operational demands, including automated recovery from infrastructure maintenance, full backup capabilities using Kasten K10 by Veeam, and zero-data-loss version upgrades. This CPU-only approach significantly reduces hardware costs and enables deployment on existing enterprise virtualization platforms.
Key takeaway
For AI Architects or MLOps Engineers evaluating enterprise RAG deployment strategies, this collaboration demonstrates you can achieve production-grade AI assistants without costly GPU infrastructure. Your teams can deploy Intel® AI for Enterprise RAG on existing VMware Tanzu environments with 4th Generation Intel® Xeon® Scalable processors, ensuring operational resilience, zero-data-loss upgrades, and accurate retrieval. This approach significantly reduces hardware costs and simplifies infrastructure management, allowing you to accelerate AI adoption within your organization.
Key insights
Enterprise-grade RAG can be deployed on existing CPU-only virtualization infrastructure, meeting production demands without GPUs.
Principles
- CPU-only inference is viable for production RAG.
- Operational resilience is paramount for enterprise AI.
- Contextual retrieval enhances RAG answer accuracy.
Method
ERAG utilizes an Ansible-based declarative deployment, a RAG Watcher for automated recovery, and a two-phase upgrade workflow with data consistency validation.
In practice
- Utilize 4th Gen Intel® Xeon® Scalable processors for RAG inference.
- Employ "similarity search with siblings" to improve RAG context.
- Integrate RAG with existing enterprise backup tools like Kasten K10.
Topics
- Enterprise RAG
- CPU-only AI Inference
- VMware Tanzu
- Operational Resilience
- Intel Xeon Processors
- Data Sovereignty
- Kubernetes
Best for: CTO, VP of Engineering/Data, AI Engineer, MLOps Engineer, AI Architect, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence (AI) articles.