Alibaba's Qwen3.7-Plus supports text, video and imagery inputs at low cost of $0.4/$1.6 per 1M token — but it's proprietary
Summary
Alibaba has released Qwen3.7-Plus, its latest large language model, featuring enhanced multimodal capabilities for text, video, and imagery inputs. This new model is available under a closed commercial license via proprietary APIs, marking a shift from Alibaba's previous open-source strategy. Qwen3.7-Plus offers a 60% lower cost than the text-only Qwen3.7-Max, with standard input processing at \$0.40 and output at \$1.60 per million tokens, and cached reads at \$0.04 per million tokens. It includes a 1-million token context window and a "preserve_thinking" parameter, allocating 256K tokens for internal chain-of-thought processing to prevent state decay in long-horizon tasks. Benchmarks show Qwen3.7-Plus scored 70.3 on Terminal Bench 2.0-Terminus and 79.0 on ScreenSpot Pro, outperforming models like DeepSeek-V4-Pro Max (67.9) and GPT-5.4 (67.4) in specific agentic and computer vision tasks, positioning it as a cost-effective alternative for enterprise-grade visual analysis and automated workflows.
Key takeaway
For AI Architects evaluating cost-performance trade-offs in multimodal agent deployments, Qwen3.7-Plus presents a compelling option. If your organization requires robust visual interface interpretation and command execution for high-frequency RPA or data engineering, you should consider its \$0.40/\$1.60 per million token pricing and \$0.04 cached reads. Be aware that its proprietary, cloud-only API necessitates careful evaluation against your data sovereignty and compliance requirements, as local deployment is not possible.
Key insights
Qwen3.7-Plus offers cost-effective multimodal AI with state preservation, despite its proprietary nature.
Principles
- State decay is a critical agentic bottleneck.
- Multimodal capabilities enhance enterprise automation.
- API compatibility simplifies model integration.
Method
The model uses a 1-million token context window with 256K for chain-of-thought and a "preserve_thinking" API parameter to maintain internal logic across multi-turn tasks.
In practice
- Route repetitive system operations to Qwen3.7-Plus.
- Utilize granular caching for high-frequency agent iterations.
- Integrate via OpenAI-compatible API endpoints.
Topics
- Multimodal AI
- Large Language Models
- Autonomous Agents
- API Pricing
- Context Management
- Data Sovereignty
Best for: CTO, VP of Engineering/Data, Machine Learning Engineer, AI Engineer, AI Architect, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.