CobwebTM: Probabilistic Concept Formation for Lifelong and Hierarchical Topic Modeling

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Natural Language Processing · Depth: Expert, quick

Summary

CobwebTM is a new low-parameter, lifelong hierarchical topic model designed for unsupervised topic discovery and dynamic topic creation in text corpora. It adapts the incremental probabilistic concept formation algorithm, Cobweb, to continuous document embeddings, allowing it to construct semantic hierarchies online without predefining the number of topics. This model addresses limitations of both neural approaches, which suffer from catastrophic forgetting and fixed capacity, and classical probabilistic models, which lack flexibility for streaming data. Across various datasets, CobwebTM demonstrates strong topic coherence, stable topics over time, and high-quality hierarchical organization, proving the efficiency of combining incremental symbolic concept formation with pretrained representations for topic modeling.

Key takeaway

For research scientists developing topic models for dynamic or streaming text data, CobwebTM offers a robust alternative to traditional neural or classical methods. Its ability to construct semantic hierarchies online and dynamically create topics without predefinition can significantly reduce tuning effort and mitigate catastrophic forgetting, improving model adaptability and long-term performance in evolving datasets.

Key insights

CobwebTM uses incremental probabilistic concept formation with document embeddings for lifelong, hierarchical topic modeling.

Principles

Method

Adapts the Cobweb algorithm to continuous document embeddings to build semantic hierarchies incrementally, enabling unsupervised topic discovery and organization.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.