[AINews] Cognition raises $1B in $26B Series D

2026-05-28 · Source: Latent.Space - Www.latent.space · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, long

Summary

Cognition, an independent AI agent lab, recently secured over \$1B in a Series D round, reaching a \$26B valuation and projecting over \$1B ARR by EOY, with current run-rate revenue at \$492M. This marks a 2.5x value increase in eight months. Concurrently, the AI landscape is seeing rapid advancements. Inference optimization is shifting towards architectural solutions like speculative decoding and KV-cache management, enabling sustainable API price reductions. Agent development emphasizes "model-harness-memory fit" with tools like LangChain's Deep Agents and the emergence of continual learning platforms. New benchmarks, including DeepSWE and ITBench-AA, focus on complex, real-world workflows. Training research introduces innovations like Sakana AI's DiffusionBlocks. Key model releases include ESMFold2 for protein prediction, Gemini Embedding 2 for multimodal embeddings, and Surya OCR 2. Developer platforms are integrating coding agents into comprehensive product stacks with advanced enterprise controls, exemplified by OpenAI and Claude Code updates. Local AI also advances with low-bit models and optimized inference engines, alongside new Qwen 3.5/3.6 local model releases.

Key takeaway

For AI Scientists and Machine Learning Engineers optimizing model deployment, you should prioritize architectural inference improvements like KV-cache management and attention design to achieve sustainable cost reductions. Evaluate agentic systems based on "model-harness-memory fit" rather than just model quality, and integrate continual learning platforms for post-deployment model adaptation. When deploying local LLMs, consider low-bit quantization and optimized engines for consumer hardware. This shift demands a holistic approach to system design, moving beyond isolated model performance.

Key insights

AI development is rapidly maturing, driven by architectural inference optimizations, agentic system integration, and specialized model releases.

Principles

Inference cost cuts stem from attention design and cache hierarchy.
Task-harness fit is crucial for agent performance.
Post-deployment learning is becoming standard infra.

Method

DeepSeek V4-Pro uses hybrid attention (Compressed Sparse, Heavily Compressed) to reduce KV cache size to ~10% and single-token inference FLOPs to 27% for 1M-token contexts.

In practice

Use MTP/speculative decoding for faster local LLM inference.
Periodically summarize and reset agent sessions for long tasks.
Explore NVFP4 GGUF builds for NVIDIA low-precision inference.

Topics

AI Agent Labs
Inference Optimization
Continual Learning
LLM Benchmarks
Multimodal Models
Low-Bit Quantization
Coding Agents

Code references

cluaiz/cluaiz

Best for: Investor, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Latent.Space - Www.latent.space.