TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds

2026-04-15 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

TokenFormer is a new unified recommendation architecture designed to bridge the gap between multi-field categorical feature interaction models and sequential user behavior models. Historically, these two paradigms have developed independently, and naive attempts to combine them often result in "Sequential Collapse Propagation" (SCP), where non-sequence fields cause dimensional collapse of sequence features. TokenFormer addresses this by introducing two key innovations: a Bottom-Full-Top-Sliding (BFTS) attention scheme, which uses full self-attention in lower layers and shrinking-window sliding attention in upper layers, and a Non-Linear Interaction Representation (NLIR) that applies one-sided non-linear multiplicative transformations to hidden states. Experiments on public benchmarks and Tencent's advertising platform confirm TokenFormer's state-of-the-art performance and improved dimensional robustness and representation discriminability.

Key takeaway

For AI Engineers developing recommender systems that integrate both multi-field and sequential data, consider adopting TokenFormer's architecture. Its BFTS attention and NLIR components directly address the "Sequential Collapse Propagation" issue, offering a more robust and discriminative approach than naive unification. This could significantly improve model performance on complex datasets, such as those found in advertising platforms, by maintaining feature integrity across diverse input types.

Key insights

TokenFormer unifies multi-field and sequential recommendation by preventing "Sequential Collapse Propagation" through novel attention and representation schemes.

Principles

Naive unification of recommendation paradigms can cause dimensional collapse.
Hybrid attention schemes can balance global and local dependencies.
Non-linear transformations enhance representation discriminability.

Method

TokenFormer uses a Bottom-Full-Top-Sliding (BFTS) attention scheme with full self-attention in lower layers and shrinking-window sliding attention in upper layers, combined with a Non-Linear Interaction Representation (NLIR) for hidden state transformations.

In practice

Implement BFTS attention for robust sequence modeling.
Apply NLIR for improved feature interaction.
Test unified models on advertising platforms.

Topics

TokenFormer
Recommender Systems
Multi-Field Recommendation
Sequential Recommendation
Sequential Collapse Propagation

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.