HSTU From Scratch in PyTorch - A complete Walkthrough

· Source: MLWhiz: Recs|ML|GenAI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Intermediate, medium

Summary

This article presents a comprehensive PyTorch walkthrough for implementing the Hierarchical Sequential Transformer Unit (HSTU) model from scratch. It details the construction of the fused (item, action) input layer, the core HSTU block incorporating SiLU attention and Relative Attention Bias (RAB), and multi-task heads for both retrieval and rating prediction. The guide utilizes the MovieLens-1M dataset, transforming ratings into POSITIVE, NEUTRAL, and NEGATIVE actions, and employs a leave-one-out train/test split. The custom HSTU implementation is benchmarked against the "rectools" library's reference HSTU and SASRec, reporting HR@10 and NDCG@10 scores. Additionally, the post covers the M-FALCON inference cache, demonstrating a 210x speedup, and is designed to train efficiently on a single GPU like a Colab T4.

Key takeaway

For Machine Learning Engineers building sequential recommender systems, this HSTU implementation provides a strong foundation for capturing complex user behaviors. You should consider integrating fused item and action embeddings, alongside SiLU attention and Relative Attention Bias, to enhance model expressiveness. Implementing the M-FALCON inference cache is crucial for achieving significant speedups, such as the demonstrated 210x, making HSTU a viable option for production-scale recommendation engines.

Key insights

HSTU fuses item and action embeddings with time-aware attention for robust sequential recommendation.

Principles

Method

Process MovieLens-1M ratings into (item, action, time) triples, mapping ratings to POSITIVE/NEUTRAL/NEGATIVE actions. Implement FusedInputEmbedding, RelativeAttentionBias, HSTUBlock, and HSTUEncoder in PyTorch. Train with multi-task retrieval and rating heads.

In practice

Topics

Code references

Best for: Machine Learning Engineer, AI Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MLWhiz: Recs|ML|GenAI.