The last six months in LLMs in five minutes
Summary
The last six months in Large Language Models (LLMs), from November 2025 to April 2026, marked a significant inflection point, particularly for coding applications. November 2025 saw the "best" model title shift five times among Anthropic, OpenAI, and Google, with models like Claude Sonnet 4.5, GPT-5.1, Gemini 3, GPT-5.1 Codex Max, and Claude Opus 4.5. Crucially, coding agents, enhanced by Reinforcement Learning from Verifiable Rewards, transitioned from "often-work" to "mostly-work," becoming reliable daily tools. February 2026 saw the "Warelay" project evolve into "OpenClaw," a personal AI assistant, popularizing the generic term "Claws" and driving Mac Mini sales. February also brought Gemini 3.1 Pro, demonstrating remarkable image generation capabilities. April 2026 introduced Google's Gemma 4 series and GLM-5.1, a 1.5TB open-weight model, alongside Qwen3.6-35B-A3B, a 20.9GB open-weight model that outperformed Claude Opus 4.7 on specific benchmarks. The period's key themes are the dramatic improvement of coding agents and the unexpected performance of local, laptop-run models.
Key takeaway
For AI Engineers evaluating LLM integration, the rapid evolution from November 2025 to April 2026 means continuous re-evaluation of model choices is critical. You should now consider advanced coding agents as reliable daily drivers, significantly boosting productivity. Furthermore, explore capable open-weight models like Gemma 4 or Qwen3.6-35B-A3B for local deployment, as their performance now wildly outperforms prior expectations, potentially reducing reliance on costly cloud APIs.
Key insights
LLM capabilities, particularly coding agents and local models, advanced rapidly from November 2025 to April 2026, marked by frequent "best" model shifts.
Principles
- LLM performance leadership is highly volatile, changing hands frequently.
- Specialized training like RL from Verifiable Rewards enhances agent quality.
- Unconventional benchmarks reveal true model generalization.
Method
Reinforcement Learning from Verifiable Rewards (RLVR) was applied to increase code quality in LLM agents.
In practice
- Integrate advanced coding agents into daily development workflows.
- Evaluate open-weight models like Gemma 4 or Qwen3.6-35B-A3B for local deployment.
- Consider dedicated hardware like Mac Minis for running personal AI assistants.
Topics
- Large Language Models
- Coding Agents
- Open-weight Models
- Model Benchmarking
- Personal AI Assistants
- Reinforcement Learning from Verifiable Rewards
Code references
Best for: NLP Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.