Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport
Summary
Human-centric Topic Modeling (Human-TM) is a new task formulation that directly integrates human-provided goals into the topic modeling process to generate interpretable, diverse, and goal-oriented topics. Traditional methods, including LDA and neural/LLM-based approaches, often yield statistically coherent but redundant or off-target topics that fail to capture user intent. To address this, the Goal-prompted Contrastive Topic Model with Optimal Transport (GCTM-OT) is proposed. GCTM-OT first employs LLM-based prompting to extract goal candidates from documents, then integrates these candidates into semantic-aware contrastive learning using optimal transport for topic discovery. Experimental results on three public subreddit datasets demonstrate that GCTM-OT surpasses existing baselines in topic coherence and diversity, while also significantly enhancing alignment with human-provided goals.
Key takeaway
For research scientists developing topic modeling systems, you should consider integrating explicit human-provided goals into your models. This approach, exemplified by GCTM-OT, can significantly improve topic interpretability and alignment with user intent, moving beyond purely statistical coherence. Evaluate your systems not just on diversity and coherence, but also on their ability to meet specific human objectives.
Key insights
Integrating human-provided goals directly into topic modeling improves topic interpretability, diversity, and goal alignment.
Principles
- Statistical coherence alone is insufficient for user intent.
- LLMs can extract goal candidates from documents.
Method
GCTM-OT uses LLM prompting for goal candidate extraction, then applies semantic-aware contrastive learning via optimal transport for topic discovery, aligning topics with human goals.
In practice
- Use LLMs to pre-process documents for goal extraction.
- Apply optimal transport for semantic topic alignment.
Topics
- Human-centric Topic Modeling
- Goal-prompted Contrastive Learning
- Optimal Transport
- LLM-based Prompting
- Topic Coherence
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.