Alibaba Qwen Team Releases Qwen3.5-397B MoE Model with 17B Active Parameters and 1M Token Context for AI agents

2026-02-16 · Source: Machine Learning ML & Generative AI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, quick

Summary

Alibaba's Qwen Team has released Qwen3.5, an open-source AI model featuring the 397B-A17B flagship. This model employs a sparse Mixture-of-Experts (MoE) architecture combined with a Gated Delta Network hybrid design, enabling 400B-class reasoning performance at the inference speed of a 17B model. This architecture delivers an 8.6x to 19.0x increase in decoding throughput. Qwen3.5 is a native vision-language model, trained via Early Fusion, and demonstrates strong capabilities in agentic tasks and visual reasoning across 201 languages. The Qwen3.5-Plus version supports an extensive 1M token context window. Released under the Apache 2.0 license, it offers a high-performance, cost-efficient foundation for developing multimodal autonomous agents.

Key takeaway

For AI Architects and MLOps Engineers evaluating foundation models, Qwen3.5 offers a compelling open-source option. Its MoE architecture provides 400B-class reasoning at 17B inference speeds, significantly reducing operational costs while supporting complex agentic and visual reasoning tasks across 201 languages. Consider integrating Qwen3.5 for your next generation of multimodal autonomous agents, especially given its Apache 2.0 license and 1M token context window.

Key insights

Qwen3.5 MoE model achieves 400B-class reasoning with 17B-level inference speed and a 1M token context.

Principles

Sparse MoE enables high performance with lower inference cost.
Early Fusion training supports native vision-language capabilities.

Method

The Qwen3.5 model utilizes a Gated Delta Network hybrid design within a sparse Mixture-of-Experts (MoE) architecture to balance reasoning power and inference efficiency.

In practice

Use Qwen3.5 for multimodal AI agent development.
Leverage 1M token context for complex reasoning tasks.

Topics

Qwen3.5
Mixture-of-Experts
Vision-Language Models
AI Agents
Long Context Window

Code references

QwenLM/Qwen3.5

Best for: AI Architect, MLOps Engineer, CTO, AI Engineer, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.