TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds

2026-04-16 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

TokenFormer is a novel recommendation architecture designed to unify multi-field categorical features and sequential user behavior dynamics, addressing the "Sequential Collapse Propagation" (SCP) issue where non-sequence fields degrade sequence features. The model introduces a "Bottom-Full-Top-Sliding (BFTS) attention scheme" that uses full self-attention in lower layers and shrinking-window sliding attention in upper layers. Additionally, it incorporates a "Non-Linear Interaction Representation (NLIR)" that applies one-sided non-linear multiplicative transformations to hidden states. Experiments on public benchmarks like KuaiRand-27K and Tencent's advertising platform demonstrate TokenFormer's state-of-the-art performance, with the tiny version outperforming the Transformer baseline by 5.00‰ AUC and HSTU-Ultra by 2.05‰. It also achieves a 4.03% uplift in GMV during online A/B tests in the WeChat Channels advertising system from January to February 2026.

Key takeaway

For AI Engineers and Research Scientists building large-scale recommender systems, TokenFormer offers a robust blueprint for unified modeling. Its BFTS attention and NLIR mechanisms effectively mitigate "Sequential Collapse Propagation," enhancing both accuracy and dimensional robustness. You should consider adopting this architecture to improve performance and efficiency, especially in data-rich industrial environments where it demonstrates sustained scaling benefits.

Key insights

TokenFormer unifies multi-field and sequential recommendation by mitigating "Sequential Collapse Propagation" through novel attention and non-linear interaction mechanisms.

Principles

Unified modeling of all interaction types is crucial.
Hierarchical attention scopes improve efficiency and robustness.
Non-linear multiplicative interactions enhance representation discriminability.

Method

TokenFormer unifies static fields, behavior tokens, and target attributes into a single stream, processed by stacked Unified Interaction Blocks (UIBs) with BFTS attention and NLIR for multiplicative feature interaction.

In practice

Use BFTS attention for efficient long sequence modeling.
Implement NLIR to prevent representation collapse.
Consider a decoupled serving strategy for efficiency.

Topics

TokenFormer
Unified Recommendation
Sequential Collapse Propagation
BFTS Attention
Non-Linear Interaction Representation

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.