How Spotify Taught an LLM to Think Like a Senior Data Analyst

· Source: Artificial Intelligence on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, medium

Summary

Spotify developed "Vedder," an internal AI data assistant, to address the challenge of LLMs failing due to insufficient data context rather than intelligence. Launched in August 2025, Vedder has been adopted by over 2,100 employees, facilitating 13,000+ conversations and 60,000+ messages across 177 data clusters covering diverse domains like advertising and finance. The system overcomes limitations of simply dumping schemas into LLMs, which struggle with 70,000+ datasets and 1.4 trillion daily data points, by implementing a "Context Layer." This layer, built on a "Cluster Model," organizes data knowledge into expert-owned domains comprising detailed datasets, vetted question-SQL examples, and tribal documentation. Crucially, Spotify found human curation of examples vastly superior to automation, with only 12.5% of auto-generated pairs being accepted. Vedder also employs a ReAct loop for SQL generation and continuous cluster health scores to maintain context accuracy.

Key takeaway

For AI Engineers or ML Architects building data-driven LLM applications, recognize that context engineering is more critical than prompt engineering. Your focus should shift from making models "smarter" to making the context smarter and ensuring domain experts own its curation. Implement structured knowledge layers, segmenting data into expert-managed domains with continuously monitored health scores. Prioritize manual curation of examples over automation to avoid importing noise, ensuring your system provides reliable, trusted answers.

Key insights

LLMs fail from lack of meaningful context, not intelligence; expert-owned context is paramount.

Principles

Method

Implement a "Context Layer" using a "Cluster Model" where domain experts curate datasets, question-SQL pairs, and documentation for specific data domains, then use a ReAct loop for query generation.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence on Medium.