Last Week in AI #335 - Opus 4.6, Codex 5.3, Gemini 3 Deep Think, GLM 5, Seedance 2.0

2024-03-11 · Source: Last Week in AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Emerging Technologies & Innovation · Depth: Intermediate, long

Summary

Anthropic released Claude Opus 4.6, a significant upgrade featuring "agent teams" for parallel task execution and an expanded 1 million token context window, enabling work over large codebases and documents. It also includes a native PowerPoint side panel for direct slide drafting and editing. OpenAI unveiled GPT-5.3-Codex, a frontier coding model available via CLI, IDE extension, web, and a new macOS app, which outperforms previous versions on SWE-Bench Pro and Terminal-Bench 2.0 while running 25% faster. Google introduced Gemini 3 Deep Think, a specialized "extended reasoning" mode for science and engineering, achieving 84.6% on ARC-AGI-2 and gold medal-level performance on international Olympiads. Chinese AI labs DeepSeek and Zhipu AI also rolled out major upgrades, with DeepSeek expanding its model's context window to over 1,000,000 tokens and Zhipu AI launching GLM-5 for "agentic engineering." ByteDance pre-released Seedance 2.0, a multimodal video generator that accepts up to 12 inputs and outputs 4–15s clips with precise reference capabilities, reportedly surpassing OpenAI's Sora 2.

Key takeaway

For CTOs and VPs of Engineering evaluating AI investments, the rapid advancements in multi-agent capabilities, expanded context windows, and specialized reasoning models from Anthropic, OpenAI, and Google necessitate a re-evaluation of current AI strategies. Your teams should explore integrating these new agentic workflows and long-context models to enhance productivity across software development, scientific research, and content creation, while also monitoring the competitive landscape from Chinese labs offering high-performance, cost-effective alternatives.

Key insights

Frontier AI models are rapidly advancing in multi-agent collaboration, extended context, and specialized reasoning across diverse domains.

Principles

Test-time compute improves accuracy in complex reasoning tasks.
Reinforcement learning drives rapid, significant model improvements.
Sparse attention balances long-context performance with efficiency.

Method

Agent teams split complex tasks into parallel subtasks for faster completion. Models use internal verification to prune incorrect reasoning paths. Retrieval-aware distillation preserves critical attention heads while replacing others with SSM recurrent heads.

In practice

Use agent teams for complex, multi-step projects.
Leverage native AI integrations for productivity apps.
Explore specialized models for scientific and engineering tasks.

Topics

AI Agent Systems
Large Language Models
Generative Video AI
AI for Software Engineering
Long Context Windows

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, AI Product Manager, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Last Week in AI.