Proposing Topic Models and Evaluation Frameworks for Analyzing Associations with External Outcomes: An Application to Leadership Analysis Using Large-Scale Corporate Review Data

2026-04-22 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Corporate Strategy & Leadership · Depth: Expert, extended

Summary

A new topic modeling method and evaluation framework are proposed for analyzing associations between text data and external outcomes, specifically applied to leadership analysis using large-scale corporate review data. The method leverages large language models (LLMs) to generate topics that simultaneously achieve interpretability, topic specificity (alignment with concrete actions), and polarity stance consistency (absence of mixed positive/negative evaluations within a topic). The evaluation framework explicitly incorporates these novel criteria, alongside automated evaluation methods for existing metrics. Using reviews from OpenWork, a major Japanese corporate review platform, the proposed method demonstrated superior interpretability, specificity, and polarity stance consistency compared to existing methods like NMF and BERTopic. It also showed consistently higher explanatory power for external outcomes such as employee morale, though not consistently for Return on Assets (ROA). The study analyzed reviews from 1,356 Japanese publicly listed firms between 2017 and 2024, using GPT-4.1-mini for topic generation and Gemini-2.5-Flash for evaluation.

Key takeaway

For AI Scientists and Research Scientists developing or applying topic models for outcome-oriented analysis, you should integrate LLM-driven refinement steps to enhance topic interpretability, specificity, and polarity stance consistency. This approach, demonstrated with corporate review data, yields topics with greater explanatory power for external outcomes like employee morale, offering more actionable insights than traditional methods. Consider adopting the proposed evaluation framework to validate your models against these critical criteria, especially when analyzing nuanced sentiment or specific behavioral patterns.

Key insights

LLM-enhanced topic modeling improves interpretability, specificity, and polarity consistency for outcome-oriented text analysis.

Principles

Topic interpretability is crucial for actionable insights.
Specificity and polarity consistency enhance topic utility.
LLMs can refine topic assignments and split topics by sentiment.

Method

The proposed method refines BERTopic outputs using LLMs for topic assignment, polarity-based splitting, and semantic integration, then evaluates topics with LLM-based metrics for specificity and polarity consistency.

In practice

Use LLMs to refine initial topic clusters.
Split topics by polarity to avoid mixed sentiment.
Integrate semantically similar topics while preserving polarity.

Topics

Topic Modeling
Large Language Models
Leadership Analysis
Corporate Review Data
Topic Specificity

Code references

bashtage/linearmodels

Best for: NLP Engineer, AI Scientist, Research Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.