GenPage: Towards End-to-End Generative Homepage Construction at Netflix
Summary
Netflix has introduced GenPage, an end-to-end generative system for constructing personalized homepages, replacing its traditional multi-stage recommendation pipeline. GenPage utilizes a single decoder-only transformer model that autoregressively generates the entire homepage, including rows, entities, and layout, based on user context and request. This approach aims for whole-page optimization, improved scaling, and greater flexibility. The system employs custom tokenization for computational efficiency and product control, and its training recipe involves pretraining followed by post-training via Weighted Binary Classification or Reinforcement Learning. In online A/B tests, GenPage delivered statistically significant gains in core user engagement metrics and reduced end-to-end serving latency by 20% compared to Netflix's mature production recommender. Offline analysis revealed that enriching the prompt yielded a 6.9% improvement in WBC loss, significantly more than scaling model capacity from 120M to 900M parameters (1.3% loss reduction).
Key takeaway
For AI/ML Engineers designing or optimizing large-scale personalized recommendation systems, consider adopting an end-to-end generative approach like GenPage. This can simplify complex multi-stage pipelines, significantly improve user engagement, and reduce serving latency by 20%. Prioritize enriching your model's context and prompt engineering, as this can yield greater performance gains than merely scaling model capacity.
Key insights
A single generative transformer can replace complex multi-stage recommenders for structured, whole-page optimization.
Principles
- End-to-end generative models simplify complex ML stacks.
- Prompt enrichment can outweigh model capacity scaling.
- RL post-training enables whole-page optimization.
Method
GenPage tokenizes user context and autoregressively generates homepages using a decoder-only transformer. Training involves pretraining via next-token prediction, then post-training with WBC or RL for page-level optimization.
In practice
- Use custom tokenization for domain-specific data.
- Employ multi-cadence incremental training for freshness.
- Enforce business rules via constrained decoding.
Topics
- Generative Recommenders
- Transformer Models
- Reinforcement Learning
- Prompt Engineering
- Netflix Personalization
- End-to-End ML
Best for: AI Architect, MLOps Engineer, AI Scientist, Machine Learning Engineer, AI Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Netflix TechBlog - Medium.