[AINews] New AI Infra decacorns: Fireworks, Baseten (with OpenRouter on the way)

· Source: Latent.Space - Www.latent.space · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Robotics & Autonomous Systems · Depth: Expert, long

Summary

Recent reports highlight a significant "Inference Inflection" in AI infrastructure, with companies like Fireworks and Baseten reportedly nearing decacorn valuations of \$15B and \$11B, respectively. OpenRouter secured a \$113M Series B, growing weekly volume from 5T to 25T tokens in six months, underscoring the demand for multi-model inference routing. Concurrently, AI agent development is shifting towards a "model + harness + eval loop" paradigm, with DeepSeek building harness teams and new benchmarks like DeepSWE emerging for agentic coding. Research agents are demonstrating latent capabilities with appropriate harnesses, while "Language Models Need Sleep" proposes a context consolidation phase for long-horizon memory. Other advancements include the AMUSE optimizer, MiniMax M3 sparse attention, and new vision models. Infrastructure concerns like datacenter power and a potential inference compute crunch are also rising. Local LLM performance, particularly with Qwen 3.6, shows strong local agentic workflows and VRAM optimization techniques.

Key takeaway

For Machine Learning Engineers building production AI systems, prioritize robust inference infrastructure and agentic harness development over solely focusing on base model strength. Your strategy should incorporate tools like OpenRouter for multi-model inference and consider techniques like context consolidation for long-horizon agents. Evaluate new benchmarks like DeepSWE for agentic coding and optimize local LLM deployments using methods like ik_llama.cpp or VRAM-saving display configurations to maximize throughput and resource efficiency.

Key insights

The AI landscape is rapidly maturing, shifting focus from raw model power to robust inference infrastructure and sophisticated agentic harnesses.

Principles

Method

Agentic workflows can convert repeatable procedures into "skills" for tasks like DevOps or code generation, managed by a process spawning fresh-context sub-agents.

In practice

Topics

Best for: Investor, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Latent.Space - Www.latent.space.