GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation
Summary
GenRec is a preference-oriented generative framework for large-scale recommendation systems, deployed on the JD App, that addresses key challenges in Generative Retrieval (GR). It tackles inconsistent outputs from identical model inputs due to pagination, the high cost of encoding long user behavior sequences with multi-token item representations, and aligning generative policies with nuanced user preferences. GenRec employs a decoder-only architecture, introducing a Page-wise NTP task for denser gradient signals and resolving one-to-many ambiguity. It also features an asymmetric linear Token Merger to compress multi-token Semantic IDs by approximately 2X with minimal accuracy loss. Furthermore, GenRec incorporates GRPO-SR, a reinforcement learning method using Group Relative Policy Optimization with NLL regularization and Hybrid Rewards to enhance training stability and mitigate reward hacking. Month-long online A/B tests showed GenRec improved click count by 9.5% and transaction count by 8.7% over the existing production pipeline.
Key takeaway
For research scientists developing large-scale generative recommendation systems, GenRec offers a validated approach to overcome common deployment hurdles. You should consider integrating page-wise supervision, token merging for input efficiency, and reinforcement learning with hybrid rewards to improve model stability and align outputs with user preferences, potentially yielding significant uplifts in key business metrics like click and transaction counts.
Key insights
GenRec is a generative recommendation framework optimizing for user preferences and large-scale deployment challenges.
Principles
- Supervise over entire interaction pages for denser gradients.
- Compress multi-token IDs while preserving decoding resolution.
- Align generative policy with user satisfaction via RL.
Method
GenRec uses a decoder-only architecture with Page-wise NTP for training, an asymmetric linear Token Merger for input compression, and GRPO-SR (Group Relative Policy Optimization with NLL regularization and Hybrid Rewards) for preference alignment.
In practice
- Implement Page-wise NTP for improved gradient signals.
- Utilize Token Merger for efficient input processing.
- Apply GRPO-SR to align generative models with user satisfaction.
Topics
- Generative Recommendation
- Large-Scale Systems
- Next-Token Prediction
- Reinforcement Learning
- User Preference Alignment
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.