TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds
Summary
TokenFormer is a new unified recommendation architecture designed to bridge the gap between multi-field categorical feature interaction models and sequential user behavior models. Historically, these two paradigms have developed independently, and naive attempts to combine them often result in "Sequential Collapse Propagation" (SCP), where non-sequence fields cause dimensional collapse of sequence features. TokenFormer addresses this by introducing two key innovations: a Bottom-Full-Top-Sliding (BFTS) attention scheme, which uses full self-attention in lower layers and shrinking-window sliding attention in upper layers, and a Non-Linear Interaction Representation (NLIR) that applies one-sided non-linear multiplicative transformations to hidden states. Experiments on public benchmarks and Tencent's advertising platform confirm TokenFormer's state-of-the-art performance and improved dimensional robustness and representation discriminability.
Key takeaway
For AI Engineers developing recommender systems that integrate both multi-field and sequential data, consider adopting TokenFormer's architecture. Its BFTS attention and NLIR components directly address the "Sequential Collapse Propagation" issue, offering a more robust and discriminative approach than naive unification. This could significantly improve model performance on complex datasets, such as those found in advertising platforms, by maintaining feature integrity across diverse input types.
Key insights
TokenFormer unifies multi-field and sequential recommendation by preventing "Sequential Collapse Propagation" through novel attention and representation schemes.
Principles
- Naive unification of recommendation paradigms can cause dimensional collapse.
- Hybrid attention schemes can balance global and local dependencies.
- Non-linear transformations enhance representation discriminability.
Method
TokenFormer uses a Bottom-Full-Top-Sliding (BFTS) attention scheme with full self-attention in lower layers and shrinking-window sliding attention in upper layers, combined with a Non-Linear Interaction Representation (NLIR) for hidden state transformations.
In practice
- Implement BFTS attention for robust sequence modeling.
- Apply NLIR for improved feature interaction.
- Test unified models on advertising platforms.
Topics
- TokenFormer
- Recommender Systems
- Multi-Field Recommendation
- Sequential Recommendation
- Sequential Collapse Propagation
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.