From the Lab to the Frontier: The Story Behind Inception

2026-03-25 · Source: Menlo Ventures · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

Inception, founded by former Stanford, UCLA, and Cornell academics Stefano Ermon, Aditya Grover, and Volodymyr Kuleshov, is pioneering diffusion models for language, a "contrarian bet" against the field's consensus. Their research demonstrated that diffusion models could match autoregressive models in text generation quality while running 10x faster. This breakthrough led them to establish Inception to scale the technology beyond academic lab limitations. The company's flagship product, Mercury-2, is described as the world's first and fastest large-scale reasoning model built on diffusion, achieving over 1,000 tokens per second on standard GPUs. Inception posits that diffusion models offer superior inference scaling, cost-efficiency, and quality for real-time AI applications like conversational agents and software development, predicting a future where all LLMs adopt this parallel inference architecture.

Key takeaway

For Directors of AI/ML evaluating next-generation LLM architectures, you should investigate Inception's diffusion-based models. Their Mercury-2 model demonstrates 10x faster inference at over 1,000 tokens per second on standard GPUs, offering significant advantages in speed, cost, and quality over traditional autoregressive models. This parallel inference approach is critical for agentic workloads and real-time applications, potentially redefining your infrastructure and product development strategies. Consider piloting diffusion LLMs to gain a competitive edge in latency-sensitive AI deployments.

Key insights

Diffusion models offer a superior, parallel inference architecture for language, outperforming autoregressive LLMs in speed and cost.

Principles

Diffusion models enable 5-10x faster inference for text generation.
Parallel inference unlocks new scaling laws for LLMs.
Academic breakthroughs require commercial scaling for real-world impact.

Method

Inception combines diffusion training with reinforcement learning to create large-scale reasoning models like Mercury-2, optimizing for quality and speed.

In practice

Deploy diffusion LLMs for real-time conversational agents.
Utilize faster iteration loops in AI-assisted software development.
Optimize agentic workloads requiring low latency.

Topics

Diffusion Models
Large Language Models
AI Inference
Generative AI
Parallel Computing
AI Startups
Mercury-2

Best for: AI Engineer, Machine Learning Engineer, NLP Engineer, Investor, Entrepreneur, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Menlo Ventures.