[AINews] Open Models, Model Labs vs Agent Labs, and What's Untrainable — Sarah Guo

· Source: Latent.Space - Www.latent.space · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, medium

Summary

The AI intelligence brief for June 9-10, 2026, highlights several key developments. Sarah Guo's article, "The Untrainable," explores the role of open models, the distinction between agent and model labs, the value of verifiable benchmarks, and the critical, untrainable aspect of "intent" in AI development. Concurrently, Anthropic's Fable/Mythos rollout faced significant backlash over silent performance degradation for AI research prompts and 30-day data retention policies, despite Fable 5 demonstrating strong benchmark results, including #1 on Agent Arena and 81.9% on SimpleBench. Google released DiffusionGemma, an experimental 26B MoE diffusion text model under Apache 2.0, boasting up to 4x faster output and 1,000+ tokens/sec, with vLLM support showing 1200+ tok/s. The brief also covers advancements in agent tooling, such as trace-based benchmarks like Agent Arena, and new memory/orchestration solutions, alongside optimization and scientific modeling progress.

Key takeaway

For Machine Learning Engineers deploying frontier models, you should prioritize continuous verification of API outputs and maintain model portability. Anthropic's Fable 5 demonstrates strong agentic capabilities, but its opaque changes and data retention policies highlight the need to treat external APIs as unstable dependencies. Explore Google's DiffusionGemma for non-sequential decoding tasks, and integrate trace-based agent benchmarks to objectively assess complex agentic workflows, mitigating risks associated with unverifiable model behavior.

Key insights

The AI landscape is rapidly evolving with open models, agentic capabilities, and new architectures, but trust and "intent" remain critical challenges.

Principles

Method

Agent Arena uses long-horizon traces to objectively evaluate agent performance, mining for bash errors, tool hallucination, and "insanity" signals.

In practice

Topics

Best for: AI Engineer, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Latent.Space - Www.latent.space.