8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

· Source: VentureBeat · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, short

Summary

AT&T, facing daily token usage of 8 billion, re-architected its AI orchestration layer to achieve significant cost savings and performance improvements. Chief Data Officer Andy Markus and his team developed a multi-agent stack using LangChain, where large language models (LLMs) act as "super agents" directing smaller, specialized small language models (SLMs) or "worker agents." This approach led to up to 90% cost savings and improved latency. The re-architected stack, deployed with Microsoft Azure, powers Ask AT&T Workflows, a drag-and-drop agent builder for employees to automate tasks. This system integrates proprietary AT&T tools for document processing, natural language-to-SQL conversion, and image analysis, ensuring human oversight and data isolation. AT&T emphasizes using "interchangeable and selectable" models and rigorous evaluation, avoiding over-engineering solutions.

Key takeaway

For AI Architects and MLOps Engineers scaling large language model deployments, consider adopting a multi-agent orchestration strategy with SLMs. This approach, exemplified by AT&T's 90% cost savings, suggests that breaking down complex tasks into smaller, purpose-driven agent workflows can significantly improve efficiency and reduce operational expenses. Evaluate whether a simpler, single-turn generative solution suffices before over-engineering with agentic AI.

Key insights

Orchestrating specialized small language models with LLM "super agents" dramatically cuts costs and boosts performance.

Principles

Method

Build a multi-agent stack using a framework like LangChain, where LLMs direct specialized SLMs. Integrate proprietary tools and ensure human oversight with logging and role-based access.

In practice

Topics

Best for: CTO, AI Architect, MLOps Engineer, Machine Learning Engineer, Data Scientist, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.