๐Ÿ˜บ ๐ŸŽ™๏ธ Mercury 2: AI that's 10x faster than ChatGPT & Claude

ยท Source: The Neuron ยท Field: Technology & Digital โ€” Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering ยท Depth: Intermediate, long

Summary

Inception Labs has launched Mercury 2, a new reasoning model that utilizes diffusion technology, a method distinct from the autoregressive approach used by models like ChatGPT, Claude, and Gemini. Unlike traditional LLMs that generate text one token at a time, Mercury 2 produces entire answers at once and then refines them. This parallel processing enables a throughput of 1,000 tokens per second on NVIDIA Blackwell GPUs, making it approximately 10 times faster than Claude 4.5 Haiku and GPT 5.2 Mini, while maintaining comparable quality. Mercury 2 also offers significantly lower pricing: $0.25 per million input tokens and $0.75 per million output tokens, alongside a 128K context window, full tool use, and JSON output support. This development, along with energy-based models from Logical Intelligence, signals a shift away from the memory-bound, one-token-at-a-time bottleneck prevalent in current large language models.

Key takeaway

For AI/ML Directors evaluating LLM infrastructure, Mercury 2 presents a compelling alternative to traditional autoregressive models. Its diffusion-based architecture offers a 10x increase in throughput and significantly lower costs, making it ideal for latency-sensitive applications like coding IDEs, voice agents, and customer support. You should investigate Mercury 2's performance and cost benefits for your specific use cases, especially where speed and efficiency are critical, and consider its potential to reduce operational expenses.

Key insights

Diffusion models offer a 10x speedup and cost reduction over autoregressive LLMs by processing tokens in parallel.

Principles

Method

Mercury 2 generates a complete text answer and then refines it, contrasting with the sequential, one-token-at-a-time generation of traditional autoregressive LLMs.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential โ†’

Editorial summary, takeaway, and curation by AIssential. Original article published by The Neuron.