NAND Reimagined in High-Bandwidth Flash to Complement HBM

· Source: Big Data & AI News - EE Times · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation, Cloud Computing & IT Infrastructure · Depth: Intermediate, short

Summary

High-bandwidth flash (HBF) is an emerging memory solution designed to significantly boost capacity and performance for AI inference workloads by vertically stacking multiple layers of NAND dies. Unlike high-bandwidth memory (HBM), which stacks DRAM for bandwidth, HBF stacks 3D NAND arrays to enhance parallel I/O. SK Hynix recently presented a hybrid memory architecture combining eight HBM3E stacks with eight HBF stacks alongside an Nvidia Blackwell (B200) GPU, demonstrating a 2.69× improvement in performance per watt over HBM-only setups. HBF leverages HBM's vertical stacking concept and a parallel sub-array architecture for high-density integration and independent read/write channels, making it ideal for read-intensive inference tasks. SK Hynix and Sandisk anticipate trial versions and samples in 2026, with commercial products expected by early 2027.

Key takeaway

For Machine Learning Engineers designing AI inference systems, HBF offers a compelling solution to the memory wall by providing significantly higher capacity at lower cost than HBM. Consider integrating HBF into hybrid memory architectures, especially for read-intensive edge AI applications or as a capacity extension for HBM in data centers, to achieve substantial performance per watt improvements and store terabytes of model data.

Key insights

HBF stacks NAND dies for high-capacity, high-throughput memory, complementing HBM in AI inference architectures.

Principles

Method

HBF combines 3D NAND flash with advanced packaging and interconnection, using a parallel sub-array architecture for independent read/write channels to achieve high throughput.

In practice

Topics

Best for: Machine Learning Engineer, NLP Engineer, Computer Vision Engineer, AI Architect, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Big Data & AI News - EE Times.