Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]

· Source: Invest Like the Best with Patrick O'Shaughnessy · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Emerging Technologies & Innovation · Depth: Intermediate, extended

Summary

Etched, founded in 2023 by Gavin Uberti and Rob Wachen, has successfully taped out a working AI chip on its first attempt, becoming the first post-ChatGPT hardware company to achieve this. The company, which has raised \$800 million and secured over \$1 billion in customer demand, develops chips and systems designed to accelerate AI model inference and reduce costs. Their initial product is a complete rack solution, encompassing the chip, boards, power delivery, interconnects, and manufacturing. Etched's architecture relies on two key technical bets: "low voltage inference" to overcome thermal throttling for higher flop density, and "cluster scale memory" using custom interconnects to achieve significantly lower chip-to-chip latency and better memory bandwidth utilization across a cluster. This approach aims to enable massive concurrency and faster task completion for AI models.

Key takeaway

For AI Hardware Engineers or Directors of AI/ML planning future data center infrastructure, Etched's specialized approach to inference hardware signals a critical shift. General-purpose chips are increasingly inefficient for modern AI workloads. You should evaluate vertically integrated solutions that prioritize "low voltage inference" and "cluster scale memory" to achieve orders of magnitude greater concurrency and lower cost per token. This enables faster wall-clock time for complex AI tasks and supports the massive user scale required for future AI applications.

Key insights

Etched's architecture leverages low voltage inference and cluster scale memory to achieve significantly faster and cheaper AI model inference.

Principles

Method

Etched utilizes prefill decode disaggregation, separating KV cache loading and token generation across server clusters. They also employ "prefetching" to parallelize development, ensuring all possible tasks are completed before chip delivery.

In practice

Topics

Best for: CTO, VP of Engineering/Data, AI Architect, AI Hardware Engineer, Director of AI/ML, Investor

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Invest Like the Best with Patrick O'Shaughnessy.