From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media

2026-04-23 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A study evaluates automated visual discourse analysis for climate change communication on social media, specifically using images from X (formerly Twitter). Researchers benchmarked six promptable vision-language models (VLMs) and 15 zero-shot CLIP-like models on two datasets: an expert-annotated set of 1,038 images and a larger corpus of over 1.2 million images with 50,000 manually validated labels. The evaluation spanned five annotation dimensions, including animal content, climate change consequences, climate action, image setting, and image type. Gemini-3.1-flash-lite demonstrated superior performance across all super-categories and both datasets, though the performance gap to moderate-sized open-weight models was relatively small. The research also found that VLMs can reliably recover population-level trends despite moderate per-image accuracy, making them suitable for large-scale discourse analysis.

Key takeaway

For research scientists and AI engineers analyzing large-scale visual social media data, consider integrating VLMs for automated discourse analysis. While per-image accuracy may vary, these models reliably capture population-level trends, enabling efficient identification of communication strategies. Focus on optimizing prompt design for specific annotation dimensions rather than complex chain-of-thought reasoning to improve results.

Key insights

VLMs can effectively analyze social media images for climate discourse, even with moderate per-image accuracy.

Principles

Distributional evaluation is key for VLM discourse analysis.
Prompt design improves VLM performance.
Chain-of-thought reasoning can reduce VLM performance.

Method

The study benchmarks promptable VLMs and zero-shot CLIP-like models on social media image datasets, evaluating performance across multiple annotation dimensions and advocating for distributional evaluation over instance-level metrics.

In practice

Use Gemini-3.1-flash-lite for top VLM performance.
Prioritize prompt design over chain-of-thought.
Focus on population trends for large-scale analysis.

Topics

Visual Discourse Analysis
Vision-Language Models
Climate Change Communication
Social Media Analysis
Prompt Engineering

Code references

KathPra/Codebooks2VLMs

Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.