Alibaba's AI video model rises to No. 2 in global rankings, as OpenAI's Sora and ByteDance's Seedance fall away

· Source: VentureBeat · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

Alibaba Cloud released HappyHorse 1.1, an upgraded AI video generation model now live on Alibaba Cloud Model Studio with API access and a 40% launch discount. This release capitalizes on a market contraction, following OpenAI's Sora discontinuation due to financial unsustainability and ByteDance's Seedance 2.0 international rollout suspension over copyright. HappyHorse 1.0 previously secured the No. 2 position on the Artificial Analysis Video Arena, scoring 1,444 in text-to-video and image-to-video, surpassing Google's Veo-3.1 by 69 points and xAI's Grok-Imagine-Video by 23 points. The 1.1 upgrade introduces multi-image reference (R2V) for consistent identity, improved motion quality, enhanced visual textures, and "zero-drift lip sync." Built on a 15-billion-parameter unified self-attention Transformer, it integrates all modalities. Alibaba's \$52.7 billion global infrastructure investment supports this, though its June 8 Pentagon listing as a Chinese military company presents geopolitical adoption risks.

Key takeaway

For AI Product Managers evaluating enterprise video generation solutions, Alibaba's HappyHorse 1.1 presents a compelling, production-ready option. Its unified architecture and advanced features like R2V address critical commercial pain points, while the 40% launch discount offers significant cost savings. However, you must weigh the technical advantages and cost benefits against potential geopolitical risks stemming from Alibaba's Pentagon listing, especially for operations with U.S. government exposure or transatlantic ties.

Key insights

Alibaba's HappyHorse 1.1 offers a unified, high-quality AI video generation solution, capitalizing on competitor withdrawals and infrastructure investment.

Principles

Method

HappyHorse uses a 15-billion-parameter unified self-attention Transformer to process text, image, video, and audio tokens in a single sequence, eliminating separate models for modalities.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Computer Vision Engineer, AI Engineer, Director of AI/ML, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.