How YouTube Finds Your Next Video in Milliseconds
Summary
The article, part of the "RecSys for MLEs" series, details the Two-Tower Model, a dominant architecture for candidate generation in large-scale recommendation systems like YouTube, Pinterest, and Airbnb. It addresses the "scale problem" where brute-force scoring of billions of items for billions of users within milliseconds is computationally impossible. The Two-Tower design decouples user and item representations, allowing item embeddings to be precomputed offline. This enables sub-millisecond retrieval through approximate nearest neighbor search. The discussion covers the model's origins in Microsoft's Deep Structured Semantic Model (DSSM) for web search, YouTube's canonical 2016 implementation, and practical aspects like In-Batch Negatives for training, LogQ Correction for debiasing, and Hard Negative Mining for optimization. A PyTorch implementation using the MovieLens-1M dataset is also included.
Key takeaway
For Machine Learning Engineers building large-scale recommendation systems, adopting a Two-Tower architecture is crucial for achieving sub-millisecond retrieval times. Your team should prioritize decoupling user and item embedding computations to leverage offline precomputation, significantly reducing online inference load. Consider implementing techniques like In-Batch Negatives and LogQ Correction to optimize training and debias results, ensuring your system generalizes effectively to real-world user behavior.
Key insights
Two-tower models enable scalable recommendations by decoupling user and item embeddings for offline precomputation.
Principles
- Decouple retrieval and ranking stages.
- Precompute item embeddings offline for speed.
- Semantic similarity maps to vector proximity.
Method
The Two-Tower method involves independently embedding users and items into a shared vector space, precomputing item embeddings, and then using approximate nearest neighbor search for fast online retrieval.
In practice
- Use In-Batch Negatives for efficient training.
- Apply LogQ Correction to mitigate popularity bias.
- Implement Hard Negative Mining for better discrimination.
Topics
- Recommendation Systems
- Two-Tower Models
- Retrieval Systems
- Scalability
- Deep Structured Semantic Model
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by MLWhiz: Recs|ML|GenAI.