Methodology
Summary
This report details the methodology used to understand social media health and wellness influencers and the experiences of U.S. adults who consume their content. It combines two approaches: an analysis of 6,828 health and wellness influencers across 12,800 accounts on Instagram, TikTok, and YouTube, and a nationally representative survey of U.S. adults. The survey data comes from two waves of Pew Research Center's American Trends Panel (ATP), conducted in June and October 2025, with 5,023 and 5,111 respondents respectively, each having a cumulative response rate of 3% and a margin of error around ±1.6-1.7 percentage points. Influencer identification involved keyword searches, podcast chart analysis, and AI classification using GPT-4.1 mini and GPT-5.1 for verification, credential, and gender classification, achieving high accuracy. The methodology also included robust weighting for survey data and detailed account matching for influencers.
Key takeaway
For research scientists designing studies on social media content and audience perception, you should integrate robust multi-modal data collection, combining large-scale social media API data with nationally representative survey panels. Employ AI models like GPT-4.1 mini and GPT-5.1 for efficient content classification and demographic inference, but always validate with human coding and manual review to mitigate false positives, especially for nuanced definitions. This hybrid approach enhances data quality and generalizability.
Key insights
This methodology combines large-scale social media analysis with national survey data to study health influencers and audience engagement.
Principles
- Oversampling improves subgroup estimate precision.
- Multi-stage weighting corrects for sampling and nonresponse.
- AI models can classify social media content at scale.
Method
The study used address-based sampling for surveys, followed by multi-stage weighting. Influencer analysis involved keyword searches, API data collection, AI classification (GPT-4.1 mini, GPT-5.1) for content, credentials, and gender, and cross-platform account matching.
In practice
- Use Modash Raw API for real-time social data.
- Employ GPT-4.1 mini for content classification.
- Implement langdetect for language filtering.
Topics
- Social Media Influencers
- Health and Wellness
- Survey Methodology
- American Trends Panel
- AI Content Classification
Best for: Research Scientist, Data Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Pew Research Center.