Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale
Summary
The "versioned late materialization" paradigm introduces a novel data infrastructure for Deep Learning Recommendation Models (DLRMs) to overcome the storage and I/O bottlenecks of the traditional "Fat Row" approach. This method pre-materializes User Interaction History (UIH) into every training example, causing significant data redundancy and limiting UIH sequence length scaling, especially in multi-tenant systems. The new paradigm stores UIH once in a normalized, immutable tier, reconstructing sequences just-in-time during training using lightweight versioned pointers. It ensures Online-to-Offline (O2O) consistency via a bifurcated protocol and employs read-optimized immutable storage with multi-dimensional projection pushdown. Disaggregated data preprocessing, pipelined I/O prefetching, and data-affinity optimizations mask reconstruction latency. This system reduces primary write bandwidth by 46.2% and per-job read bandwidth by 47-70%, enabling UIH sequence scaling from 4K to 64K, yielding significant model quality gains like a 1.2% cumulative Normalized Entropy (NE) improvement for Platform A.
Key takeaway
For MLOps Engineers scaling Deep Learning Recommendation Models with ultra-long User Interaction History, the traditional "Fat Row" paradigm creates an unsustainable storage and I/O wall. You should evaluate adopting a versioned late materialization architecture to eliminate data redundancy and enable aggressive sequence length scaling. This approach, proven to reduce infrastructure costs and deliver significant model quality gains (e.g., 1.2% NE improvement), allows your systems to move beyond the 4K sequence length bottleneck and support next-generation architectures like ULTRA-HSTU.
Key insights
Versioned late materialization eliminates UIH data redundancy by reconstructing sequences just-in-time from a single immutable source using lightweight versioned pointers.
Principles
- UIH is append-only and immutable, allowing temporal reconstruction.
- O2O consistency requires sufficiency, not physical pre-materialization.
- Disaggregated preprocessing masks I/O latency for GPU-bound training.
Method
The protocol bifurcates UIH into mutable and immutable portions. Inference-time snapshotting logs lightweight version metadata. Training-time "Time-Travel" reconstruction uses this metadata and bounded range scans to rebuild sequences.
In practice
- Implement hybrid storage for mutable and immutable UIH.
- Use multi-dimensional projection for multi-tenant efficiency.
- Apply data-affinity sharding for batch training I/O optimization.
Topics
- Recommendation Systems
- Deep Learning Recommendation Models
- User Interaction History
- Late Materialization
- Data Infrastructure
- Online-to-Offline Consistency
- Multi-tenant Architectures
Best for: AI Scientist, Research Scientist, AI Architect, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.