GenPage: Towards End-to-End Generative Homepage Construction at Netflix

· Source: Netflix TechBlog - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Advanced, extended

Summary

Netflix has introduced GenPage, an end-to-end generative system for constructing personalized homepages, replacing its traditional multi-stage recommendation pipeline. GenPage utilizes a single decoder-only transformer model that autoregressively generates the entire homepage, including rows, entities, and layout, based on user context and request. This approach aims for whole-page optimization, improved scaling, and greater flexibility. The system employs custom tokenization for computational efficiency and product control, and its training recipe involves pretraining followed by post-training via Weighted Binary Classification or Reinforcement Learning. In online A/B tests, GenPage delivered statistically significant gains in core user engagement metrics and reduced end-to-end serving latency by 20% compared to Netflix's mature production recommender. Offline analysis revealed that enriching the prompt yielded a 6.9% improvement in WBC loss, significantly more than scaling model capacity from 120M to 900M parameters (1.3% loss reduction).

Key takeaway

For AI/ML Engineers designing or optimizing large-scale personalized recommendation systems, consider adopting an end-to-end generative approach like GenPage. This can simplify complex multi-stage pipelines, significantly improve user engagement, and reduce serving latency by 20%. Prioritize enriching your model's context and prompt engineering, as this can yield greater performance gains than merely scaling model capacity.

Key insights

A single generative transformer can replace complex multi-stage recommenders for structured, whole-page optimization.

Principles

Method

GenPage tokenizes user context and autoregressively generates homepages using a decoder-only transformer. Training involves pretraining via next-token prediction, then post-training with WBC or RL for page-level optimization.

In practice

Topics

Best for: AI Architect, MLOps Engineer, AI Scientist, Machine Learning Engineer, AI Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Netflix TechBlog - Medium.