🇨🇳 Alibaba unveils new Qwen3.5 model for ‘agentic AI era’, Qwen3.5-397B-A17B. Apache 2.0 license
Summary
Alibaba has introduced Qwen3.5, a new 397B-parameter sparse mixture-of-experts model, designed for the "agentic AI era." This model keeps only 17B parameters active per token, achieving up to 19.0x higher decode throughput than Qwen3-Max at 256K context. Qwen3.5 reports strong performance on benchmarks like IFBench (76.5), BFCL v4 (72.9), AIME26 (91.3), and SWE-bench Verified (76.4). Its efficiency stems from a hybrid attention mechanism, mixing linear attention via Gated DeltaNet with regular attention layers to manage memory growth. Alibaba also launched Qwen3.5-Plus, a managed API version with a 1M token context window and built-in tools. Concurrently, Nanbeige LLM Lab released Nanbeige4.1-3B, a 3B parameter model that outperforms Qwen3-4B-2507 on deep-search agent benchmarks and LeetCode contests, supporting up to 256k token contexts. Rumors also circulate about Seedance 3.0's advanced video generation capabilities, including full-length feature films from single prompts and native multilingual dubbing. Meanwhile, the U.S. Pentagon is reportedly pressuring AI labs for broad military use of their models, with Anthropic facing potential contract termination over its refusal to allow use for fully autonomous weapons or mass domestic surveillance.
Key takeaway
For CTOs and VPs of Engineering evaluating next-generation AI models, consider the Qwen3.5 and Nanbeige4.1-3B releases for their advancements in efficiency, long-context processing, and agentic capabilities. Your teams should investigate hybrid attention architectures and advanced reinforcement learning techniques to optimize model performance and resource utilization. Be aware of the ongoing discussions regarding military use of frontier AI models, as this may influence future policy and access to certain technologies.
Key insights
Sparse Mixture-of-Experts and hybrid attention enhance LLM efficiency and long-context performance for agentic AI.
Principles
- Hybrid attention improves long-context efficiency.
- RL scaling across diverse environments boosts agentic capabilities.
- Early fusion integrates multimodality natively.
Method
Qwen3.5 uses Gated DeltaNet for linear attention mixed with regular attention. Nanbeige4.1-3B employs upgraded supervised fine-tuning plus two reinforcement learning stages: point-wise scoring with GRPO and pair-wise comparisons.
In practice
- Deploy Qwen3.5-Plus for managed API access.
- Utilize Nanbeige4.1-3B for deep-search agent tasks.
- Consider hybrid attention for long-context LLM deployments.
Topics
- Large Language Models
- Mixture-of-Experts
- Reinforcement Learning
- Generative AI
- AI Ethics & Policy
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, AI Product Manager, Tech Journalist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Rohan's Bytes.