Encoding Your Domain Expert: The Context Layer Behind Spotify's Data Assistant

· Source: Spotify Engineering · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, medium

Summary

Spotify developed "Vedder," an AI data assistant, to address the overwhelming demand for data insights from its 70,000+ datasets, which process 1.4 trillion data points daily. Traditional LLM approaches failed due to limited context windows and schemas not conveying critical business logic. Vedder, actively utilized since August 2025 by over 2,100 Spotifiers across 13,000+ conversations, operates on a "cluster model" where domain experts curate "clusters" of data. Each cluster comprises relevant datasets with full schemas, vetted question-and-SQL example "pairs," and additional business "docs." This human-curated context is crucial for trustworthiness; a trial showed experts accepted only 12.5% of automatically generated question-SQL pairs from query history, highlighting the noise in raw data. Clusters are continuously monitored via health scores, prompting experts to update context as data evolves.

Key takeaway

For AI Architects or Data Scientists building internal data assistants, relying solely on raw schemas or query logs for LLM context is insufficient and untrustworthy. You should empower domain experts to curate and own specific data "clusters" with vetted examples and business context. This approach ensures accuracy and scalability, transforming experts from answering one-off questions to shaping a reliable knowledge layer that serves thousands. Continuously monitor context health to prevent degradation and maintain trust.

Key insights

Human-curated context, not raw schemas, is essential for trustworthy AI data assistants at scale.

Principles

Method

Spotify's data agent uses a ReAct loop, selecting context, writing SQL, running queries, and returning answers with sources.

In practice

Topics

Best for: AI Product Manager, Product Manager, CTO, AI Engineer, Data Scientist, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Spotify Engineering.