New Server Hopes to Break Through AI’s “Memory Wall”

2026-06-01 · Source: IEEE Spectrum · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Advanced, quick

Summary

Majestic Labs, an AI hardware startup, is developing Prometheus, a new AI server designed to overcome the "memory wall" bottleneck in large language model (LLM) inference. This server will feature up to 128 terabytes of memory, over 60 times more than Nvidia's DGX B300 server. Prometheus employs a DRAM-centric architecture, utilizing LPDDR6 and a proprietary memory interface with miniature copper cables to achieve memory bandwidth up to 25.6 terabytes per second. Its compute engine is the custom Ignite AI processing unit, which integrates ARM application cores with RISC-V vector and tensor cores on a single die, with 12 Ignite chips per server. Prometheus will support PyTorch, vLLM, and OpenAI's Triton inference frameworks without code modifications. The Open Compute Project-compliant server, expected to ship in 2027, will support modular memory upgrades, up to 120 kilowatts per rack, and cold-plate liquid cooling. Majestic Labs claims this design will reduce capital expenditure and power consumption by 10 to 50 times compared to current solutions.

Key takeaway

For AI Architects and Machine Learning Engineers designing LLM inference infrastructure, Majestic Labs' Prometheus server presents a significant shift. If you are struggling with memory bottlenecks and high operational costs, consider how a DRAM-centric, high-capacity server could drastically reduce capital expenditure and power consumption. You should evaluate this architecture's potential to scale LLM inference more economically than current HBM-limited systems, especially as models continue to grow.

Key insights

Majestic Labs' Prometheus server aims to break the LLM memory wall with a DRAM-centric architecture and integrated AI processing.

Principles

LLM token generation is memory-bound.
HBM limits memory capacity due to interface design.
Unified DRAM architecture can scale memory.

Method

Majestic Labs' Prometheus server uses a proprietary copper cable memory interface and custom aggregation chips to scale LPDDR6 DRAM, paired with Ignite AI processors for unified compute.

In practice

Consider DRAM-centric designs for LLM memory scaling.
Explore unified compute/memory architectures for inference.
Evaluate systems supporting existing AI frameworks (PyTorch, vLLM, Triton).

Topics

AI Hardware
LLM Inference
DRAM Architecture
Prometheus Server
Ignite Processor
Memory Wall

Best for: AI Engineer, NLP Engineer, AI Hardware Engineer, AI Architect, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by IEEE Spectrum.