Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cloud Computing & IT Infrastructure · Depth: Expert, extended

Summary

The "versioned late materialization" paradigm introduces a novel data infrastructure for Deep Learning Recommendation Models (DLRMs) to overcome the storage and I/O bottlenecks of the traditional "Fat Row" approach. This method pre-materializes User Interaction History (UIH) into every training example, causing significant data redundancy and limiting UIH sequence length scaling, especially in multi-tenant systems. The new paradigm stores UIH once in a normalized, immutable tier, reconstructing sequences just-in-time during training using lightweight versioned pointers. It ensures Online-to-Offline (O2O) consistency via a bifurcated protocol and employs read-optimized immutable storage with multi-dimensional projection pushdown. Disaggregated data preprocessing, pipelined I/O prefetching, and data-affinity optimizations mask reconstruction latency. This system reduces primary write bandwidth by 46.2% and per-job read bandwidth by 47-70%, enabling UIH sequence scaling from 4K to 64K, yielding significant model quality gains like a 1.2% cumulative Normalized Entropy (NE) improvement for Platform A.

Key takeaway

For MLOps Engineers scaling Deep Learning Recommendation Models with ultra-long User Interaction History, the traditional "Fat Row" paradigm creates an unsustainable storage and I/O wall. You should evaluate adopting a versioned late materialization architecture to eliminate data redundancy and enable aggressive sequence length scaling. This approach, proven to reduce infrastructure costs and deliver significant model quality gains (e.g., 1.2% NE improvement), allows your systems to move beyond the 4K sequence length bottleneck and support next-generation architectures like ULTRA-HSTU.

Key insights

Versioned late materialization eliminates UIH data redundancy by reconstructing sequences just-in-time from a single immutable source using lightweight versioned pointers.

Principles

UIH is append-only and immutable, allowing temporal reconstruction.
O2O consistency requires sufficiency, not physical pre-materialization.
Disaggregated preprocessing masks I/O latency for GPU-bound training.

Method

The protocol bifurcates UIH into mutable and immutable portions. Inference-time snapshotting logs lightweight version metadata. Training-time "Time-Travel" reconstruction uses this metadata and bounded range scans to rebuild sequences.

In practice

Implement hybrid storage for mutable and immutable UIH.
Use multi-dimensional projection for multi-tenant efficiency.
Apply data-affinity sharding for batch training I/O optimization.

Topics

Recommendation Systems
Deep Learning Recommendation Models
User Interaction History
Late Materialization
Data Infrastructure
Online-to-Offline Consistency
Multi-tenant Architectures

Best for: AI Scientist, Research Scientist, AI Architect, Machine Learning Engineer, MLOps Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.