[D] Advice on sequential recommendations architectures

2026-02-15 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Recommendation Systems · Depth: Advanced, quick

Summary

A user is attempting to model sequential user actions using a Transformer decoder architecture, similar to GPT-2, but with interactions expressed as a series of attributes rather than item IDs. For instance, an interaction like "user clicked on a red button on the top left of the screen showing the word Hello" is tokenized into a sequence of attribute-value pairs. The goal is to predict a key downstream action, such as a purchase within seven days, using standard cross-entropy loss and evaluating success with recall@k. Despite experimenting with various GPT-2-based architectures, including next-token prediction, weighted down-funnel actions, and contrastive heads, the model's performance barely surpasses naive baselines, suggesting a potential mismatch in approach or data utility.

Key takeaway

For AI Engineers and Data Scientists struggling with sequential recommendation models that underperform simple baselines, re-evaluate your data representation and loss function. Your current approach of flattening attributes into tokens might be causing the model to learn token statistics instead of meaningful user behavior. Consider shifting to event-level embeddings with an encoder-style model and a ranking loss, or a softmax loss for smaller catalogs, to better capture sequential patterns.

Key insights

Sequential recommendation challenges often stem from representation and objective mismatches, not just architecture.

Principles

Sequence information may not always be useful.
Event-level embeddings often outperform flattened attribute tokens.

Method

Consider using event-level embeddings with encoder-style models like SASRec and a ranking loss, rather than GPT-style next-token prediction, for sequential recommenders.

In practice

Evaluate the utility of sequential patterns first.
Try softmax loss for small catalog sizes.
Explore RecSys competition winning papers for insights.

Topics

Sequential Recommendation
Transformer Architectures
User Behavior Modeling
Representation Learning
Loss Functions

Best for: Machine Learning Engineer, AI Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.