OneReason Technical Report
Summary
OneReason is a novel generative recommendation model designed to overcome the limited reasoning capabilities of existing OneRec family models, which are widely deployed in services like short-video, live-streaming, advertising, and e-commerce. Previous generative recommendation models struggle to activate reasoning due to the inability to construct meaningful Chain-of-Thought (CoT) sequences from itemic tokens alone. Initial explorations, such as OneRec-Think and OpenOneRec, inspired by LLM's "think before answer" paradigm, unexpectedly showed no advantage. OneReason addresses this by focusing on two factors: perception, which grounds itemic tokens in their underlying language semantics during pre-training, and cognition, which reorganizes user behavior sequences into coherent latent interest points. Its training incorporates a three-level cognition-enhanced CoT format for recommendation tasks in SFT and a specialize-then-unify training recipe in RL. This technical report was published on 2026-06-04.
Key takeaway
For Machine Learning Engineers developing generative recommendation systems, OneReason's approach suggests that simply applying LLM-style "think before answer" paradigms is insufficient. You should instead focus on explicitly integrating itemic token perception and user behavior cognition into your model architecture and training. Consider pre-training to ground item semantics, implementing multi-level Chain-of-Thought formats, and utilizing a specialize-then-unify RL training recipe to activate true reasoning capabilities in your recommendation models.
Key insights
Effective generative recommendation reasoning requires both itemic token perception and user behavior cognition.
Principles
- Reasoning needs semantic grounding for itemic tokens.
- User behavior sequences benefit from cognitive reorganization.
- "Think before answer" paradigm needs adaptation for recommendations.
Method
OneReason employs pre-training for itemic token perception, a three-level cognition-enhanced CoT format in SFT, and a specialize-then-unify training recipe in RL to enhance reasoning.
In practice
- Integrate language semantics into item embeddings.
- Design multi-level CoT for recommendation tasks.
- Apply RL with specialized and unified training.
Topics
- Generative Recommendation
- Chain-of-Thought
- Reinforcement Learning
- Large Language Models
- Item Perception
- User Cognition
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.