Introducing Gemini Omni

2026-05-19 · Source: Google DeepMind News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Novice, long

Summary

Google has introduced Gemini Omni, a new multimodal AI model designed to create and edit videos from diverse inputs. The first model in this family, Gemini Omni Flash, is rolling out today to Google AI Plus, Pro, and Ultra subscribers via the Gemini app and Google Flow, and to YouTube Shorts and YouTube Create App users at no cost. Omni allows users to combine images, audio, video, and text as input to generate high-quality videos, and to edit them naturally using conversational language. Key capabilities include transforming scenes, reimagining actions, and refining videos across multiple turns while maintaining consistency. The model integrates Gemini's world knowledge, offering an intuitive understanding of physics and the ability to blend creativity with factual context for realistic and meaningful storytelling. It also supports creating videos with digital avatars and includes SynthID watermarking for content transparency.

Key takeaway

For content creators and AI product managers developing video generation tools, Gemini Omni Flash offers a significant leap in multimodal editing capabilities. You should explore its conversational video editing features and its ability to ground creations in real-world physics and knowledge. Consider integrating Omni Flash via APIs in upcoming weeks to enhance user workflows for complex video production, utilizing its consistent scene memory and iterative refinement for more sophisticated outputs.

Key insights

Gemini Omni Flash enables multimodal video creation and conversational editing, grounded in real-world knowledge and physics.

Principles

Multimodal input enhances creative flexibility.
Conversational editing maintains scene consistency.
Responsible AI requires transparency and watermarking.

Method

The model processes combined images, audio, video, and text inputs to generate and iteratively edit videos using natural language prompts, ensuring consistent characters and physics.

In practice

Generate explainer videos from short prompts.
Transform existing video scenes or actions.
Create videos using your own digital avatar.

Topics

Gemini Omni Flash
Multimodal AI
Video Generation
Conversational Editing
Generative AI
Digital Watermarking

Best for: Product Manager, Machine Learning Engineer, Computer Vision Engineer, AI Scientist, AI Engineer, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Google DeepMind News.