Tensordyne Tapes Out LNS-Based AI Chip, Claims Huge Power Advantages
Summary
AI chip startup Tensordyne has taped out its data center inference chip, claiming an order-of-magnitude power efficiency improvement over leading GPUs. The company states its systems achieve 17x tokens per second per Watt and 13x tokens per second per rack compared to Nvidia GB300-based systems for the same workload. Built on TSMC 3 nm, the chip consumes 300 W, offers 2.1 PFLOPS (dense FP8) compute, and includes 144 GB HBM3e. Tensordyne's proprietary Pareto number system, based on the Logarithmic Number System (LNS) with dedicated hardware acceleration, underpins this advantage. Their 72-chip Napier server, air-cooled at 30 kW, holds 10 TB of HBM, sufficient for a 10T FP4 model. Full racks deliver 608 PFLOPS and 42 TB HBM at 120 kW. Development cloud access is planned by late 2026, with systems shipping by Q2 2027.
Key takeaway
For AI Architects and MLOps Engineers evaluating next-generation inference hardware, Tensordyne's LNS-based chip presents a compelling alternative to traditional GPU solutions. You should investigate its claimed 17x power efficiency and \$11 per million tokens cost for large language models. This is especially relevant if your workloads involve 10T+ parameter models or agentic AI. Consider piloting their development cloud by late 2026 to characterize performance for your specific applications before Q2 2027 system shipments.
Key insights
Tensordyne's LNS-based AI chip offers significant power and cost efficiency for large model inference via novel math and hardware.
Principles
- Dedicated LNS hardware acceleration boosts efficiency.
- Proprietary math systems can yield order-of-magnitude gains.
- Cell-based NoC reduces tail latency in distributed systems.
Method
Tensordyne's software stack handles all LNS conversions, abstracting the proprietary math from users. AI agents can convert GPU-specific code from various frameworks.
In practice
- Evaluate LNS-based hardware for 10T+ parameter model inference.
- Consider cell-based NoC designs for low-latency distributed AI.
- Utilize AI agents for framework-agnostic code translation.
Topics
- AI Inference Chips
- Logarithmic Number System
- Tensordyne Napier
- Data Center AI
- Network-on-Chip
- Large Language Models
Best for: Investor, CTO, VP of Engineering/Data, AI Hardware Engineer, AI Architect, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Big Data & AI News - EE Times.