Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

2026-04-22 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

This study presents a comprehensive evaluation of modern Large Language Models (LLMs), including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT, across three core social media analytics tasks using a Twitter (X) dataset. The tasks include Social Media Authorship Verification, Social Media Post Generation, and User Attribute Inference. For authorship verification, a systematic sampling framework was introduced, and generalization was tested on newly collected tweets from January 2024 to mitigate "seen-data" bias. Post generation assessed LLMs' ability to produce authentic, user-like content, with a user study measuring real users' perceptions of LLM-generated posts. For attribute inference, occupations and interests were annotated using IAB Tech Lab and 2018 U.S. SOC taxonomies, benchmarking LLMs against existing baselines. The research establishes reproducible benchmarks and provides new insights into LLM capabilities in social media analytics.

Key takeaway

For AI Engineers and Research Scientists developing social media analytics solutions, this evaluation highlights specific LLM strengths. GPT-4's superior performance in authorship verification, particularly on unseen data, suggests it's a strong choice for forensic or content validation tasks. Gemini's accuracy in user attribute inference, even at fine-grained levels, makes it suitable for advanced user profiling. Be mindful of the trade-offs between lexical reuse, semantic similarity, and human-perceived authenticity when selecting models for post generation.

Key insights

Modern LLMs demonstrate varied capabilities across social media analytics tasks, with GPT-4 excelling in authorship verification and Gemini in attribute inference.

Principles

Controlled sampling mitigates "seen-data" bias in LLM evaluation.
Multifaceted evaluation is crucial for assessing LLMs in social media contexts.
Lexical reuse can enhance perceived authenticity in generated content.

Method

The study employs a multi-task evaluation framework for LLMs on Twitter data, using systematic sampling, human perception studies, and standardized taxonomies for attribute inference, alongside various automatic metrics.

In practice

Use GPT-4 for robust social media authorship verification.
Consider Gemini for accurate user attribute inference, especially at fine granularity.
Implement diverse negative sampling strategies for authorship verification.

Topics

Large Language Models
Social Media Analytics
Authorship Verification
Post Generation
User Attribute Inference

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.