Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest
Summary
This study presents a comprehensive evaluation of modern Large Language Models (LLMs), including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT, across three core social media analytics tasks using a Twitter (X) dataset. The tasks include Social Media Authorship Verification, Social Media Post Generation, and User Attribute Inference. For authorship verification, a systematic sampling framework was introduced, and generalization was tested on newly collected tweets from January 2024 to mitigate "seen-data" bias. Post generation assessed LLMs' ability to produce authentic, user-like content, with a user study measuring real users' perceptions of LLM-generated posts. For attribute inference, occupations and interests were annotated using IAB Tech Lab and 2018 U.S. SOC taxonomies, benchmarking LLMs against existing baselines. The research establishes reproducible benchmarks and provides new insights into LLM capabilities in social media analytics.
Key takeaway
For AI Engineers and Research Scientists developing social media analytics solutions, this evaluation highlights specific LLM strengths. GPT-4's superior performance in authorship verification, particularly on unseen data, suggests it's a strong choice for forensic or content validation tasks. Gemini's accuracy in user attribute inference, even at fine-grained levels, makes it suitable for advanced user profiling. Be mindful of the trade-offs between lexical reuse, semantic similarity, and human-perceived authenticity when selecting models for post generation.
Key insights
Modern LLMs demonstrate varied capabilities across social media analytics tasks, with GPT-4 excelling in authorship verification and Gemini in attribute inference.
Principles
- Controlled sampling mitigates "seen-data" bias in LLM evaluation.
- Multifaceted evaluation is crucial for assessing LLMs in social media contexts.
- Lexical reuse can enhance perceived authenticity in generated content.
Method
The study employs a multi-task evaluation framework for LLMs on Twitter data, using systematic sampling, human perception studies, and standardized taxonomies for attribute inference, alongside various automatic metrics.
In practice
- Use GPT-4 for robust social media authorship verification.
- Consider Gemini for accurate user attribute inference, especially at fine granularity.
- Implement diverse negative sampling strategies for authorship verification.
Topics
- Large Language Models
- Social Media Analytics
- Authorship Verification
- Post Generation
- User Attribute Inference
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.