I compared how Gemini, ChatGPT, and Claude can analyze videos - this model wins

· Source: News and Advice on the World's Latest Innovations | ZDNET · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Intermediate, extended

Summary

ZDNET conducted a comparative test of three major AI models—Gemini Pro, ChatGPT Plus, and Claude Max—to assess their ability to "watch" and interpret video content from YouTube URLs and local MP4/MOV files. The tests involved analyzing a YouTube video on annealing, a silent MP4 drone test, and a local MOV file of a walk-and-talk. Gemini Pro demonstrated superior native video understanding, accurately describing actions in a silent drone video and summarizing complex verbal content. ChatGPT Plus, while unable to process videos directly, achieved similar results by integrating with the OpenAI Codex app, which downloaded videos and used Python scripts for analysis. Claude Max, however, failed to process any video content, explicitly stating it lacks the capability. The AIs processed 15-minute videos in 2-3 minutes, showing strong interpretative skills and potential for applications like security footage analysis and content summarization.

Key takeaway

For Machine Learning Engineers evaluating multimodal AI capabilities, Gemini Pro offers robust, native video understanding for diverse formats, making it ideal for direct video analysis. If your workflow is heavily invested in OpenAI, you can achieve similar video processing by integrating ChatGPT Plus with OpenAI Codex, though it requires more setup and scripting. Consider these differences when choosing a platform for video-centric applications like content summarization or automated visual analysis.

Key insights

Gemini Pro excels at native video understanding, while ChatGPT requires external tools like Codex for similar capabilities.

Principles

Method

Test AI models by prompting them to "watch" videos from YouTube links and local files (MP4, MOV) to avoid metadata reliance. Evaluate their ability to summarize content, describe actions, and generate related imagery.

In practice

Topics

Best for: Machine Learning Engineer, Computer Vision Engineer, AI Engineer, AI Product Manager, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by News and Advice on the World's Latest Innovations | ZDNET.