Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?

2025-08-07 · Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

Researchers introduced the Meme Reply Selection task and the MaMe-Re (Manga Meme Reply Benchmark), a dataset of 100,000 human-annotated pairs of openly licensed Japanese manga panels and social media posts, with 500,000 total annotations from 2,325 unique annotators. The study found that large language models (LLMs) show an initial ability to capture complex social cues like exaggeration, moving beyond simple semantic matching. However, including visual information did not improve performance, indicating a gap in using visual content for contextual humor. Furthermore, while LLMs matched human judgments in controlled settings, they struggled to differentiate subtle wit among semantically similar candidates, suggesting that selecting contextually humorous replies remains a significant challenge for current models.

Key takeaway

For research scientists developing conversational AI, you should recognize that while LLMs can grasp complex social cues for humor, they currently struggle with multimodal integration and distinguishing subtle humor in semantically similar options. Prioritize improving models' ability to discern nuanced wit in text-based contexts and develop new architectures that effectively couple visual recognition with pragmatic contextual inference, rather than relying solely on scaling multimodal encoders.

Key insights

LLMs show promise in understanding social cues for humor, but struggle with visual information and subtle wit in meme reply selection.

Principles

Humor is an emergent quality of meme-context interaction.
Recontextualization is key to meme humor.
Visual information does not consistently improve humor selection.

Method

The Meme Reply Selection task involves choosing the funniest meme for a given conversational context, evaluated by a funniness score $s(c,m)$ and Score@1 metric.

In practice

Use LLMs for nuanced humor generation.
Focus on textual context over visual for meme selection.
Design evaluation settings for subtle humor distinctions.

Topics

Meme Reply Selection
MaMe-Re Benchmark
Large Language Models
Contextual Humor
Multimodal AI

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.