🇨🇳 Alibaba unveils new Qwen3.5 model for ‘agentic AI era’, Qwen3.5-397B-A17B. Apache 2.0 license

2025-08-21 · Source: Rohan's Bytes · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

Alibaba has introduced Qwen3.5, a new 397B-parameter sparse mixture-of-experts model, designed for the "agentic AI era." This model keeps only 17B parameters active per token, achieving up to 19.0x higher decode throughput than Qwen3-Max at 256K context. Qwen3.5 reports strong performance on benchmarks like IFBench (76.5), BFCL v4 (72.9), AIME26 (91.3), and SWE-bench Verified (76.4). Its efficiency stems from a hybrid attention mechanism, mixing linear attention via Gated DeltaNet with regular attention layers to manage memory growth. Alibaba also launched Qwen3.5-Plus, a managed API version with a 1M token context window and built-in tools. Concurrently, Nanbeige LLM Lab released Nanbeige4.1-3B, a 3B parameter model that outperforms Qwen3-4B-2507 on deep-search agent benchmarks and LeetCode contests, supporting up to 256k token contexts. Rumors also circulate about Seedance 3.0's advanced video generation capabilities, including full-length feature films from single prompts and native multilingual dubbing. Meanwhile, the U.S. Pentagon is reportedly pressuring AI labs for broad military use of their models, with Anthropic facing potential contract termination over its refusal to allow use for fully autonomous weapons or mass domestic surveillance.

Key takeaway

For CTOs and VPs of Engineering evaluating next-generation AI models, consider the Qwen3.5 and Nanbeige4.1-3B releases for their advancements in efficiency, long-context processing, and agentic capabilities. Your teams should investigate hybrid attention architectures and advanced reinforcement learning techniques to optimize model performance and resource utilization. Be aware of the ongoing discussions regarding military use of frontier AI models, as this may influence future policy and access to certain technologies.

Key insights

Sparse Mixture-of-Experts and hybrid attention enhance LLM efficiency and long-context performance for agentic AI.

Principles

Hybrid attention improves long-context efficiency.
RL scaling across diverse environments boosts agentic capabilities.
Early fusion integrates multimodality natively.

Method

Qwen3.5 uses Gated DeltaNet for linear attention mixed with regular attention. Nanbeige4.1-3B employs upgraded supervised fine-tuning plus two reinforcement learning stages: point-wise scoring with GRPO and pair-wise comparisons.

In practice

Deploy Qwen3.5-Plus for managed API access.
Utilize Nanbeige4.1-3B for deep-search agent tasks.
Consider hybrid attention for long-context LLM deployments.

Topics

Large Language Models
Mixture-of-Experts
Reinforcement Learning
Generative AI
AI Ethics & Policy

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, AI Product Manager, Tech Journalist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Rohan's Bytes.