I got an old server with lots of RAM, but no GPU, and ended up getting Grok 2 running anyway ;)

2026-04-30 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, quick

Summary

A user successfully ran Grok 2, a large language model, on a Dell r640 1U server equipped with dual Xeon Platinum 8268 processors and 1.5TB of 2666MHz RAM, despite the absence of a dedicated GPU. The setup achieved a prompt processing speed of 4.73 tokens/second and a generation speed of 1.35 tokens/second, supporting a 512K context and web search capabilities. This configuration utilized NUMA architecture and 40 threads. The user is now seeking advice on fitting Tesla GPUs into the 1U server's stock risers without physical modification and general recommendations for similar GPU-less AI builds.

Key takeaway

For AI Engineers evaluating LLM deployment on existing server infrastructure without GPUs, consider that high-RAM, multi-core CPU servers can run models like Grok 2. While performance will be lower than GPU-accelerated setups, this approach can serve as a viable interim solution or for less demanding inference tasks. Investigate specific GPU dimensions and server riser compatibility before purchasing hardware.

Key insights

Large language models like Grok 2 can operate on CPU-only servers with substantial RAM, albeit with reduced performance.

Principles

High RAM capacity can compensate for GPU absence in LLM inference.
NUMA architecture can optimize CPU-based LLM performance.

In practice

Utilize high-capacity RAM servers for CPU-only LLM inference.
Configure NUMA and thread counts for performance tuning.

Topics

Grok 2
Dell r640
Server Hardware
Large Language Models
GPU-less AI

Best for: Machine Learning Engineer, AI Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.