Let's go Bananas with GenMedia — Guillaume Vernade, Google DeepMind

· Source: AI Engineer · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, extended

Summary

DeepMind's GenMedia models, including Nano Banana 2, VIO 3.1 Light, and LIA, offer advanced multimodal generation capabilities for images, videos, and music. Nano Banana 2 supports new aspect ratios and image grounding for enhanced search, while VIO 3.1 Light provides a cost-effective solution for video generation at $0.05 per second. LIA, the music generation model, can create 30-second clips or full 3-minute songs, with a real-time variant allowing dynamic music changes. The presentation highlighted a practical application: illustrating an open-source book using Gemini to generate prompts and GenMedia models to produce corresponding visual and auditory content. It also detailed the distinction between Google's AI Studio Gemini API and Vertex AI, emphasizing the former's developer-friendly approach and the latter's enterprise-grade control.

Key takeaway

For AI Engineers building multimodal applications, you should explore DeepMind's GenMedia models, particularly by integrating Gemini for prompt generation to ensure content consistency and quality. Consider using the Interactions API for stateful context management to optimize performance and cost, especially when working with large inputs like entire books. Be mindful of regional model availability and associated costs, opting for cheaper "light" models for iterative development before upscaling.

Key insights

DeepMind's GenMedia models offer multimodal AI for creative content generation, leveraging Gemini for intelligent prompting.

Principles

Method

Utilize Gemini to generate structured prompts for characters and scenes from a book, then feed these prompts to GenMedia models (Nano Banana 2, VIO, LIA) to create consistent images, videos, and music, optionally incorporating character references for visual consistency.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.