not much happened today
Summary
The AI news recap for May 28-29, 2026, highlights several key developments across proprietary and open-source AI. Anthropic rolled out Claude Opus 4.8, showing incremental gains in coding and platform-level changes like mid-conversation system instructions, though benchmarks were mixed and pricing remains a concern. In agent infrastructure, a critical "Token-In, Token-Out" bug was identified in multi-turn RL training loops, and new work on Effective Feedback Compute (EFC) demonstrated harness quality's significant impact on agent success. Open-weight models continue to gain momentum, now used by 1 in 3 AI teams and lagging frontier models by about four months, with llama.app launching for easier local deployment. Google expanded its managed agent stack with Gemini Spark and Flow Agent, while OpenAI enhanced Codex with Windows support and mobile remote steering, signaling a trend towards vertically integrated agent environments. Additionally, a Starlette vulnerability (CVE-2026-48710) exposed LLM infrastructure to risks, and new research explored bidirectional search, continual learning, and multimodal world models.
Key takeaway
For AI Engineers deploying agentic systems, you should prioritize robust infrastructure and vigilant cost management. Verify your RL training loops adhere to a "Token-In, Token-Out" rule to avoid subtle re-tokenization bugs, and actively monitor prompt cache hit-rates given Anthropic's reduced TTL. Furthermore, immediately check your Starlette dependency for CVE-2026-48710 to mitigate critical security vulnerabilities in your LLM infrastructure, especially for internet-exposed services.
Key insights
Agentic AI development is converging on integrated stacks, robust infrastructure, and careful token economics for performance and cost.
Principles
- "Token-In, Token-Out" prevents RL re-tokenization bugs.
- Harness quality significantly impacts agent success (R² up to 0.99).
- Local-first inference is critical for responsive voice agents.
Method
The proposed fix for multi-turn RL bugs is a strict "Token-In, Token-Out" rule, maintaining a single token buffer across turns to prevent re-encoding sampled tokens and misapplying gradients.
In practice
- Audit prompt cache hit-rates due to reduced TTL (5 min).
- Use billing caps/alerts to prevent runaway token consumption.
- Check Starlette versions for CVE-2026-48710 exposure.
Topics
- AI Agents
- Large Language Models
- Local AI
- RL Training
- API Security
- Token Economics
- Claude Opus
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Machine Learning Engineer, AI Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AINews.