8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%
Summary
AT&T, facing daily token usage of 8 billion, re-architected its AI orchestration layer to achieve significant cost savings and performance improvements. Chief Data Officer Andy Markus and his team developed a multi-agent stack using LangChain, where large language models (LLMs) act as "super agents" directing smaller, specialized small language models (SLMs) or "worker agents." This approach led to up to 90% cost savings and improved latency. The re-architected stack, deployed with Microsoft Azure, powers Ask AT&T Workflows, a drag-and-drop agent builder for employees to automate tasks. This system integrates proprietary AT&T tools for document processing, natural language-to-SQL conversion, and image analysis, ensuring human oversight and data isolation. AT&T emphasizes using "interchangeable and selectable" models and rigorous evaluation, avoiding over-engineering solutions.
Key takeaway
For AI Architects and MLOps Engineers scaling large language model deployments, consider adopting a multi-agent orchestration strategy with SLMs. This approach, exemplified by AT&T's 90% cost savings, suggests that breaking down complex tasks into smaller, purpose-driven agent workflows can significantly improve efficiency and reduce operational expenses. Evaluate whether a simpler, single-turn generative solution suffices before over-engineering with agentic AI.
Key insights
Orchestrating specialized small language models with LLM "super agents" dramatically cuts costs and boosts performance.
Principles
- Prioritize SLMs for domain-specific accuracy.
- Adopt "interchangeable and selectable" models.
- Evaluate accuracy, cost, and responsiveness.
Method
Build a multi-agent stack using a framework like LangChain, where LLMs direct specialized SLMs. Integrate proprietary tools and ensure human oversight with logging and role-based access.
In practice
- Use SLMs for specific tasks to reduce LLM reliance.
- Implement human-in-the-loop for agentic workflows.
- Automate software development with "AI-fueled coding."
Topics
- AI Orchestration
- Small Language Models
- Multi-Agent Systems
- AI-fueled Coding
- Cost Optimization
Best for: CTO, AI Architect, MLOps Engineer, Machine Learning Engineer, Data Scientist, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.