Proposing Topic Models and Evaluation Frameworks for Analyzing Associations with External Outcomes: An Application to Leadership Analysis Using Large-Scale Corporate Review Data
Summary
A new topic modeling method and evaluation framework are proposed for analyzing associations between text data and external outcomes, specifically applied to leadership analysis using large-scale corporate review data. The method leverages large language models (LLMs) to generate topics that simultaneously achieve interpretability, topic specificity (alignment with concrete actions), and polarity stance consistency (absence of mixed positive/negative evaluations within a topic). The evaluation framework explicitly incorporates these novel criteria, alongside automated evaluation methods for existing metrics. Using reviews from OpenWork, a major Japanese corporate review platform, the proposed method demonstrated superior interpretability, specificity, and polarity stance consistency compared to existing methods like NMF and BERTopic. It also showed consistently higher explanatory power for external outcomes such as employee morale, though not consistently for Return on Assets (ROA). The study analyzed reviews from 1,356 Japanese publicly listed firms between 2017 and 2024, using GPT-4.1-mini for topic generation and Gemini-2.5-Flash for evaluation.
Key takeaway
For AI Scientists and Research Scientists developing or applying topic models for outcome-oriented analysis, you should integrate LLM-driven refinement steps to enhance topic interpretability, specificity, and polarity stance consistency. This approach, demonstrated with corporate review data, yields topics with greater explanatory power for external outcomes like employee morale, offering more actionable insights than traditional methods. Consider adopting the proposed evaluation framework to validate your models against these critical criteria, especially when analyzing nuanced sentiment or specific behavioral patterns.
Key insights
LLM-enhanced topic modeling improves interpretability, specificity, and polarity consistency for outcome-oriented text analysis.
Principles
- Topic interpretability is crucial for actionable insights.
- Specificity and polarity consistency enhance topic utility.
- LLMs can refine topic assignments and split topics by sentiment.
Method
The proposed method refines BERTopic outputs using LLMs for topic assignment, polarity-based splitting, and semantic integration, then evaluates topics with LLM-based metrics for specificity and polarity consistency.
In practice
- Use LLMs to refine initial topic clusters.
- Split topics by polarity to avoid mixed sentiment.
- Integrate semantically similar topics while preserving polarity.
Topics
- Topic Modeling
- Large Language Models
- Leadership Analysis
- Corporate Review Data
- Topic Specificity
Code references
Best for: NLP Engineer, AI Scientist, Research Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.