Alibaba's AI video model rises to No. 2 in global rankings, as OpenAI's Sora and ByteDance's Seedance fall away
Summary
Alibaba Cloud released HappyHorse 1.1, an upgraded AI video generation model now live on Alibaba Cloud Model Studio with API access and a 40% launch discount. This release capitalizes on a market contraction, following OpenAI's Sora discontinuation due to financial unsustainability and ByteDance's Seedance 2.0 international rollout suspension over copyright. HappyHorse 1.0 previously secured the No. 2 position on the Artificial Analysis Video Arena, scoring 1,444 in text-to-video and image-to-video, surpassing Google's Veo-3.1 by 69 points and xAI's Grok-Imagine-Video by 23 points. The 1.1 upgrade introduces multi-image reference (R2V) for consistent identity, improved motion quality, enhanced visual textures, and "zero-drift lip sync." Built on a 15-billion-parameter unified self-attention Transformer, it integrates all modalities. Alibaba's \$52.7 billion global infrastructure investment supports this, though its June 8 Pentagon listing as a Chinese military company presents geopolitical adoption risks.
Key takeaway
For AI Product Managers evaluating enterprise video generation solutions, Alibaba's HappyHorse 1.1 presents a compelling, production-ready option. Its unified architecture and advanced features like R2V address critical commercial pain points, while the 40% launch discount offers significant cost savings. However, you must weigh the technical advantages and cost benefits against potential geopolitical risks stemming from Alibaba's Pentagon listing, especially for operations with U.S. government exposure or transatlantic ties.
Key insights
Alibaba's HappyHorse 1.1 offers a unified, high-quality AI video generation solution, capitalizing on competitor withdrawals and infrastructure investment.
Principles
- Unified architecture simplifies integration.
- Human evaluation drives quality benchmarks.
- Infrastructure investment enables enterprise scale.
Method
HappyHorse uses a 15-billion-parameter unified self-attention Transformer to process text, image, video, and audio tokens in a single sequence, eliminating separate models for modalities.
In practice
- Use R2V for consistent character identity.
- Generate 1080p video with synchronized audio.
- Specify complex prompts for precise control.
Topics
- AI Video Generation
- HappyHorse 1.1
- Alibaba Cloud
- Enterprise AI
- Geopolitical Risk
- Cloud Infrastructure
- Video Benchmarking
Best for: CTO, VP of Engineering/Data, Computer Vision Engineer, AI Engineer, Director of AI/ML, AI Product Manager
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.