From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media
Summary
A study evaluates automated visual discourse analysis for climate change communication on social media, specifically using images from X (formerly Twitter). Researchers benchmarked six promptable vision-language models (VLMs) and 15 zero-shot CLIP-like models on two datasets: an expert-annotated set of 1,038 images and a larger corpus of over 1.2 million images with 50,000 manually validated labels. The evaluation spanned five annotation dimensions, including animal content, climate change consequences, climate action, image setting, and image type. Gemini-3.1-flash-lite demonstrated superior performance across all super-categories and both datasets, though the performance gap to moderate-sized open-weight models was relatively small. The research also found that VLMs can reliably recover population-level trends despite moderate per-image accuracy, making them suitable for large-scale discourse analysis.
Key takeaway
For research scientists and AI engineers analyzing large-scale visual social media data, consider integrating VLMs for automated discourse analysis. While per-image accuracy may vary, these models reliably capture population-level trends, enabling efficient identification of communication strategies. Focus on optimizing prompt design for specific annotation dimensions rather than complex chain-of-thought reasoning to improve results.
Key insights
VLMs can effectively analyze social media images for climate discourse, even with moderate per-image accuracy.
Principles
- Distributional evaluation is key for VLM discourse analysis.
- Prompt design improves VLM performance.
- Chain-of-thought reasoning can reduce VLM performance.
Method
The study benchmarks promptable VLMs and zero-shot CLIP-like models on social media image datasets, evaluating performance across multiple annotation dimensions and advocating for distributional evaluation over instance-level metrics.
In practice
- Use Gemini-3.1-flash-lite for top VLM performance.
- Prioritize prompt design over chain-of-thought.
- Focus on population trends for large-scale analysis.
Topics
- Visual Discourse Analysis
- Vision-Language Models
- Climate Change Communication
- Social Media Analysis
- Prompt Engineering
Code references
Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.