CobwebTM: Probabilistic Concept Formation for Lifelong and Hierarchical Topic Modeling
Summary
CobwebTM is a new low-parameter, lifelong hierarchical topic model designed for unsupervised topic discovery and dynamic topic creation in text corpora. It adapts the incremental probabilistic concept formation algorithm, Cobweb, to continuous document embeddings, allowing it to construct semantic hierarchies online without predefining the number of topics. This model addresses limitations of both neural approaches, which suffer from catastrophic forgetting and fixed capacity, and classical probabilistic models, which lack flexibility for streaming data. Across various datasets, CobwebTM demonstrates strong topic coherence, stable topics over time, and high-quality hierarchical organization, proving the efficiency of combining incremental symbolic concept formation with pretrained representations for topic modeling.
Key takeaway
For research scientists developing topic models for dynamic or streaming text data, CobwebTM offers a robust alternative to traditional neural or classical methods. Its ability to construct semantic hierarchies online and dynamically create topics without predefinition can significantly reduce tuning effort and mitigate catastrophic forgetting, improving model adaptability and long-term performance in evolving datasets.
Key insights
CobwebTM uses incremental probabilistic concept formation with document embeddings for lifelong, hierarchical topic modeling.
Principles
- Online construction of semantic hierarchies.
- Dynamic topic creation without predefined counts.
Method
Adapts the Cobweb algorithm to continuous document embeddings to build semantic hierarchies incrementally, enabling unsupervised topic discovery and organization.
In practice
- Unsupervised topic discovery in streaming data.
- Dynamic topic creation for evolving corpora.
Topics
- CobwebTM
- Probabilistic Concept Formation
- Lifelong Topic Modeling
- Hierarchical Topic Discovery
- Document Embeddings
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.