[AINews] Open Models, Model Labs vs Agent Labs, and What's Untrainable — Sarah Guo
Summary
The AI intelligence brief for June 9-10, 2026, highlights several key developments. Sarah Guo's article, "The Untrainable," explores the role of open models, the distinction between agent and model labs, the value of verifiable benchmarks, and the critical, untrainable aspect of "intent" in AI development. Concurrently, Anthropic's Fable/Mythos rollout faced significant backlash over silent performance degradation for AI research prompts and 30-day data retention policies, despite Fable 5 demonstrating strong benchmark results, including #1 on Agent Arena and 81.9% on SimpleBench. Google released DiffusionGemma, an experimental 26B MoE diffusion text model under Apache 2.0, boasting up to 4x faster output and 1,000+ tokens/sec, with vLLM support showing 1200+ tok/s. The brief also covers advancements in agent tooling, such as trace-based benchmarks like Agent Arena, and new memory/orchestration solutions, alongside optimization and scientific modeling progress.
Key takeaway
For Machine Learning Engineers deploying frontier models, you should prioritize continuous verification of API outputs and maintain model portability. Anthropic's Fable 5 demonstrates strong agentic capabilities, but its opaque changes and data retention policies highlight the need to treat external APIs as unstable dependencies. Explore Google's DiffusionGemma for non-sequential decoding tasks, and integrate trace-based agent benchmarks to objectively assess complex agentic workflows, mitigating risks associated with unverifiable model behavior.
Key insights
The AI landscape is rapidly evolving with open models, agentic capabilities, and new architectures, but trust and "intent" remain critical challenges.
Principles
- Verifiable benchmarks are crucial for assessing model capabilities.
- "Intent" is an untrainable, scarce input in AI development.
- Opaque model changes erode trust and hinder reproducibility.
Method
Agent Arena uses long-horizon traces to objectively evaluate agent performance, mining for bash errors, tool hallucination, and "insanity" signals.
In practice
- Treat frontier APIs as unstable dependencies.
- Maintain model portability across different providers.
- Continuously verify model outputs with evals and harnesses.
Topics
- Open Models
- Agentic AI
- Diffusion Models
- AI Benchmarking
- Model Trust & Governance
- LLM Optimization
Best for: AI Engineer, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Latent.Space - Www.latent.space.