Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]
Summary
Etched, founded in 2023 by Gavin Uberti and Rob Wachen, has successfully taped out a working AI chip on its first attempt, becoming the first post-ChatGPT hardware company to achieve this. The company, which has raised \$800 million and secured over \$1 billion in customer demand, develops chips and systems designed to accelerate AI model inference and reduce costs. Their initial product is a complete rack solution, encompassing the chip, boards, power delivery, interconnects, and manufacturing. Etched's architecture relies on two key technical bets: "low voltage inference" to overcome thermal throttling for higher flop density, and "cluster scale memory" using custom interconnects to achieve significantly lower chip-to-chip latency and better memory bandwidth utilization across a cluster. This approach aims to enable massive concurrency and faster task completion for AI models.
Key takeaway
For AI Hardware Engineers or Directors of AI/ML planning future data center infrastructure, Etched's specialized approach to inference hardware signals a critical shift. General-purpose chips are increasingly inefficient for modern AI workloads. You should evaluate vertically integrated solutions that prioritize "low voltage inference" and "cluster scale memory" to achieve orders of magnitude greater concurrency and lower cost per token. This enables faster wall-clock time for complex AI tasks and supports the massive user scale required for future AI applications.
Key insights
Etched's architecture leverages low voltage inference and cluster scale memory to achieve significantly faster and cheaper AI model inference.
Principles
- Tailor chip design to specific AI inference workloads.
- Prioritize thermal solutions for higher flop utilization.
- Vertical integration accelerates product development cycles.
Method
Etched utilizes prefill decode disaggregation, separating KV cache loading and token generation across server clusters. They also employ "prefetching" to parallelize development, ensuring all possible tasks are completed before chip delivery.
In practice
- Implement low voltage power delivery for AI chips.
- Develop custom interconnects for cluster-scale memory.
- Mock thermal profiles to validate cooling solutions early.
Topics
- AI Hardware
- AI Inference
- Chip Architecture
- Low Voltage Inference
- Cluster Scale Memory
- Vertical Integration
- Data Center Infrastructure
Best for: CTO, VP of Engineering/Data, AI Architect, AI Hardware Engineer, Director of AI/ML, Investor
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Invest Like the Best with Patrick O'Shaughnessy.