GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, medium

Summary

GenRec is a preference-oriented generative framework for large-scale recommendation systems, deployed on the JD App, that addresses key challenges in Generative Retrieval (GR). It tackles inconsistent outputs from identical model inputs due to pagination, the high cost of encoding long user behavior sequences with multi-token item representations, and aligning generative policies with nuanced user preferences. GenRec employs a decoder-only architecture, introducing a Page-wise NTP task for denser gradient signals and resolving one-to-many ambiguity. It also features an asymmetric linear Token Merger to compress multi-token Semantic IDs by approximately 2X with minimal accuracy loss. Furthermore, GenRec incorporates GRPO-SR, a reinforcement learning method using Group Relative Policy Optimization with NLL regularization and Hybrid Rewards to enhance training stability and mitigate reward hacking. Month-long online A/B tests showed GenRec improved click count by 9.5% and transaction count by 8.7% over the existing production pipeline.

Key takeaway

For research scientists developing large-scale generative recommendation systems, GenRec offers a validated approach to overcome common deployment hurdles. You should consider integrating page-wise supervision, token merging for input efficiency, and reinforcement learning with hybrid rewards to improve model stability and align outputs with user preferences, potentially yielding significant uplifts in key business metrics like click and transaction counts.

Key insights

GenRec is a generative recommendation framework optimizing for user preferences and large-scale deployment challenges.

Principles

Method

GenRec uses a decoder-only architecture with Page-wise NTP for training, an asymmetric linear Token Merger for input compression, and GRPO-SR (Group Relative Policy Optimization with NLL regularization and Hybrid Rewards) for preference alignment.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.