Claude Code Anniversary + Launches from: Qwen 3.5, Cursor Demos, Cognition Devin 2.2, Inception Mercury 2

2026-02-24 · Source: AINews · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Advanced, extended

Summary

Alibaba has launched the Qwen 3.5 "medium series" of models, including Qwen3.5-Flash, Qwen3.5-35B-A3B (MoE), Qwen3.5-122B-A10B (MoE), and Qwen3.5-27B (dense), emphasizing intelligence-per-watt over sheer parameter count. Notably, the 35B-A3B model reportedly surpasses its 235B predecessor. OpenAI released GPT-5.3-Codex to developers via the Responses API at $1.75 input / $14 output, expanding file input types and leveraging web sockets for 30% faster rollouts. Anthropic introduced "Claude Code Remote Control" for terminal sessions and enhanced enterprise workflow customization. Inception Labs unveiled Mercury 2, a diffusion LLM achieving ~1,000 output tokens/s, prioritizing speed. Meanwhile, agent reliability remains a concern, with a Princeton study identifying a significant capability-reliability gap and new failure modes like "routine-step decomposition" safety bypasses. Meta announced a multi-year deal with AMD for 6GW of Instinct GPUs, and MatX secured $500M Series B for its "One" accelerator chip, combining systolic array efficiency with HBM+SRAM for long-context workloads.

Key takeaway

For NLP Engineers and CTOs evaluating model deployment strategies, the rapid advancements in efficient open-weight models like Qwen 3.5 and specialized coding agents from OpenAI and Anthropic demand attention. You should prioritize models demonstrating high intelligence-per-watt and low-latency inference, especially for edge or consumer device applications, while rigorously testing agent reliability and carefully managing context to avoid increased costs and performance degradation.

Key insights

AI development prioritizes efficiency and specialized capabilities, with open-weight models challenging larger predecessors and agent reliability remaining a key hurdle.

Principles

Architecture, data, and RL can outperform raw parameter scaling.
Latency and throughput are emerging as critical competitive battlegrounds.
LLM-generated context files can decrease agent success and increase costs.

Method

Alibaba's Qwen 3.5 models utilize a hybrid architecture with Gated Delta Networks and Mixture-of-Experts for enhanced multimodal learning and inference efficiency, supporting 201 languages.

In practice

Consider Qwen3.5-35B-A3B for high intelligence-per-watt applications.
Explore GPT-5.3-Codex for coding agents with expanded file input types.
Prioritize minimal, developer-written context files for agents to reduce costs.

Topics

Large Language Models
AI Agents
Model Distillation
AI Hardware
Robotics

Code references

Best for: NLP Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AINews.