Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

This study presents a comprehensive evaluation of modern Large Language Models (LLMs), including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT, across three core social media analytics tasks using a Twitter (X) dataset. The tasks include Social Media Authorship Verification, Social Media Post Generation, and User Attribute Inference. For authorship verification, a systematic sampling framework was introduced, and generalization was tested on newly collected tweets from January 2024 to mitigate "seen-data" bias. Post generation assessed LLMs' ability to produce authentic, user-like content, with a user study measuring real users' perceptions of LLM-generated posts. For attribute inference, occupations and interests were annotated using IAB Tech Lab and 2018 U.S. SOC taxonomies, benchmarking LLMs against existing baselines. The research establishes reproducible benchmarks and provides new insights into LLM capabilities in social media analytics.

Key takeaway

For AI Engineers and Research Scientists developing social media analytics solutions, this evaluation highlights specific LLM strengths. GPT-4's superior performance in authorship verification, particularly on unseen data, suggests it's a strong choice for forensic or content validation tasks. Gemini's accuracy in user attribute inference, even at fine-grained levels, makes it suitable for advanced user profiling. Be mindful of the trade-offs between lexical reuse, semantic similarity, and human-perceived authenticity when selecting models for post generation.

Key insights

Modern LLMs demonstrate varied capabilities across social media analytics tasks, with GPT-4 excelling in authorship verification and Gemini in attribute inference.

Principles

Method

The study employs a multi-task evaluation framework for LLMs on Twitter data, using systematic sampling, human perception studies, and standardized taxonomies for attribute inference, alongside various automatic metrics.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.