Tensordyne Claims Massive Speed and Power Improvement Over Nvidia

· Source: IEEE Spectrum · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Advanced, short

Summary

Tensordyne, a new startup, claims its Napier AI chip and 72-chip system will significantly outperform Nvidia's GB300 in energy efficiency and latency for LLM inferencing. Simulations suggest the 72-chip system can run large LLMs four times faster using one-fifth the power compared to a 72-Nvidia GB300 system. Commercial sales are scheduled for the second half of 2027, with real system validation expected by the end of the year. The core innovation is Napier's matrix multiplication method, which converts multipliers into more energy-efficient adders by leveraging logarithmic math. This approach, combined with proprietary linear-to-logarithm conversion technology, allows for denser compute and power savings. The system also features 144 gigabytes of HBM and a custom 1-microsecond-latency network, enabling it to handle both computationally heavy LLM prefill and memory-dependent decode stages within a single, compact "pod" system. A four-pod rack, designed for a 2-trillion parameter LLM, is projected to deliver 1,300 tokens per-second per-user at \$11 per million tokens, consuming 120 kilowatts.

Key takeaway

For AI Architects and Machine Learning Engineers evaluating next-generation inference hardware, Tensordyne's Napier chip presents a compelling, albeit unproven, alternative to traditional GPU architectures. Its claimed 4x speed and 1/5 power reduction for LLM inference, achieved through novel logarithmic math, could drastically lower operational costs. You should monitor Tensordyne's beta availability later this year to validate these performance claims before committing to large-scale deployments.

Key insights

Tensordyne's Napier chip uses logarithmic math for matrix multiplication, achieving significant AI inference speed and power efficiency gains.

Principles

Method

Napier's method involves converting linear numbers to logarithmic for matrix multiplication, then back, using proprietary, accurate, and low-cost silicon-based conversion techniques to replace energy-intensive multipliers with adders.

In practice

Topics

Best for: AI Hardware Engineer, AI Architect, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by IEEE Spectrum.