Alibaba's AI video model rises to No. 2 in global rankings, as OpenAI's Sora and ByteDance's Seedance fall away

2026-06-22 · Source: VentureBeat · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

Alibaba Cloud released HappyHorse 1.1, an upgraded AI video generation model now live on Alibaba Cloud Model Studio with API access and a 40% launch discount. This release capitalizes on a market contraction, following OpenAI's Sora discontinuation due to financial unsustainability and ByteDance's Seedance 2.0 international rollout suspension over copyright. HappyHorse 1.0 previously secured the No. 2 position on the Artificial Analysis Video Arena, scoring 1,444 in text-to-video and image-to-video, surpassing Google's Veo-3.1 by 69 points and xAI's Grok-Imagine-Video by 23 points. The 1.1 upgrade introduces multi-image reference (R2V) for consistent identity, improved motion quality, enhanced visual textures, and "zero-drift lip sync." Built on a 15-billion-parameter unified self-attention Transformer, it integrates all modalities. Alibaba's \$52.7 billion global infrastructure investment supports this, though its June 8 Pentagon listing as a Chinese military company presents geopolitical adoption risks.

Key takeaway

For AI Product Managers evaluating enterprise video generation solutions, Alibaba's HappyHorse 1.1 presents a compelling, production-ready option. Its unified architecture and advanced features like R2V address critical commercial pain points, while the 40% launch discount offers significant cost savings. However, you must weigh the technical advantages and cost benefits against potential geopolitical risks stemming from Alibaba's Pentagon listing, especially for operations with U.S. government exposure or transatlantic ties.

Key insights

Alibaba's HappyHorse 1.1 offers a unified, high-quality AI video generation solution, capitalizing on competitor withdrawals and infrastructure investment.

Principles

Unified architecture simplifies integration.
Human evaluation drives quality benchmarks.
Infrastructure investment enables enterprise scale.

Method

HappyHorse uses a 15-billion-parameter unified self-attention Transformer to process text, image, video, and audio tokens in a single sequence, eliminating separate models for modalities.

In practice

Use R2V for consistent character identity.
Generate 1080p video with synchronized audio.
Specify complex prompts for precise control.

Topics

AI Video Generation
HappyHorse 1.1
Alibaba Cloud
Enterprise AI
Geopolitical Risk
Cloud Infrastructure
Video Benchmarking

Best for: CTO, VP of Engineering/Data, Computer Vision Engineer, AI Engineer, Director of AI/ML, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.