How Shopify built an AI stack that doesn't care which models survive
Summary
Shopify has developed a robust AI stack designed for model agnosticism and resilience, as detailed by Farhan Thawar, head of engineering. A key component is an internal LLM proxy that provides engineers access to multiple AI providers, automatically failing over to alternatives like Claude Opus or GPT 5.5 if a model like Claude Fable 5 becomes unavailable. This system also enables bulk token purchasing and usage reporting. Furthermore, Shopify employs distillation, training smaller, specialized student models (SLMs) from larger teacher models (e.g., Opus 4.8 to Qwen 3.5) for tasks like its Sidekick AI assistant. These SLMs can be 2x to 30x cheaper and faster, often achieving higher accuracy for narrow tasks. The Universal Distillation Platform (UDP) automates this process, and an internal platform, Tangle, visualizes pipeline execution. Shopify also uses a usage dashboard with "circuit breakers" to manage token spend and promotes a philosophy of maximizing AI's strategic value.
Key takeaway
For AI Architects or MLOps Engineers building enterprise AI systems, you should prioritize developing a multi-provider, infrastructure-first approach to mitigate vendor lock-in and ensure operational continuity. Implement an LLM proxy for automatic failover and explore model distillation for specialized tasks to optimize cost and performance. Your strategy should focus on maximizing AI's utility across the organization, moving beyond simple AI adoption to strategic, cost-aware integration.
Key insights
Building model-agnostic AI infrastructure ensures resilience and cost-efficiency amidst evolving LLM landscapes.
Principles
- Prioritize infrastructure over features.
- Embrace multi-provider AI strategies.
- Distill large models for specialized tasks.
Method
Shopify's Universal Distillation Platform (UDP) takes a teacher model, data, evals, and a target model (e.g., Opus 4.8 to Qwen 3.5). It runs for about a day, returning an evaluation of speed, cost, and accuracy for deployment.
In practice
- Implement an LLM proxy for failover.
- Use usage dashboards to monitor token spend.
- Develop internal platforms for AI workflow visualization.
Topics
- LLM Proxy
- Model Distillation
- AI Infrastructure
- Multi-provider AI
- AI Cost Optimization
- AI Agents
Best for: CTO, VP of Engineering/Data, AI Engineer, Director of AI/ML, MLOps Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.