Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of $0.4/$1.6 per 1M token — but it's proprietary

2026-06-02 · Source: VentureBeat · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, medium

Summary

Alibaba has released Qwen3.7-Plus, its latest large language model, featuring enhanced multimodal capabilities for text, video, and imagery inputs. This new model is available under a closed commercial license via proprietary APIs, marking a shift from Alibaba's previous open-source strategy. Qwen3.7-Plus offers a 60% lower cost than the text-only Qwen3.7-Max, with standard input processing at \$0.40 and output at \$1.60 per million tokens, and cached reads at \$0.04 per million tokens. It includes a 1-million token context window and a "preserve_thinking" parameter, allocating 256K tokens for internal chain-of-thought processing to prevent state decay in long-horizon tasks. Benchmarks show Qwen3.7-Plus scored 70.3 on Terminal Bench 2.0-Terminus and 79.0 on ScreenSpot Pro, outperforming models like DeepSeek-V4-Pro Max (67.9) and GPT-5.4 (67.4) in specific agentic and computer vision tasks, positioning it as a cost-effective alternative for enterprise-grade visual analysis and automated workflows.

Key takeaway

For AI Architects evaluating cost-performance trade-offs in multimodal agent deployments, Qwen3.7-Plus presents a compelling option. If your organization requires robust visual interface interpretation and command execution for high-frequency RPA or data engineering, you should consider its \$0.40/\$1.60 per million token pricing and \$0.04 cached reads. Be aware that its proprietary, cloud-only API necessitates careful evaluation against your data sovereignty and compliance requirements, as local deployment is not possible.

Key insights

Qwen3.7-Plus offers cost-effective multimodal AI with state preservation, despite its proprietary nature.

Principles

State decay is a critical agentic bottleneck.
Multimodal capabilities enhance enterprise automation.
API compatibility simplifies model integration.

Method

The model uses a 1-million token context window with 256K for chain-of-thought and a "preserve_thinking" API parameter to maintain internal logic across multi-turn tasks.

In practice

Route repetitive system operations to Qwen3.7-Plus.
Utilize granular caching for high-frequency agent iterations.
Integrate via OpenAI-compatible API endpoints.

Topics

Multimodal AI
Large Language Models
Autonomous Agents
API Pricing
Context Management
Data Sovereignty

Best for: CTO, VP of Engineering/Data, Machine Learning Engineer, AI Engineer, AI Architect, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.