OmniGCD: Abstracting Generalized Category Discovery for Modality Agnosticism
Summary
OmniGCD introduces a novel modality-agnostic approach to Generalized Category Discovery (GCD), a task involving identifying both known and novel classes from partially labeled data. Unlike previous GCD methods that are confined to single modalities and necessitate dataset-specific fine-tuning, OmniGCD processes inputs using modality-specific encoders (e.g., vision, audio, text, remote sensing), then reduces dimensions to create a GCD latent space. This latent space is subsequently transformed at test-time for improved clustering via a synthetically trained Transformer-based model. The authors evaluate OmniGCD in a zero-shot GCD setting, prohibiting dataset-specific fine-tuning, and demonstrate its effectiveness across 16 datasets spanning four modalities. OmniGCD, trained once on synthetic data, achieves significant classification accuracy improvements over baselines: +6.2 for vision, +17.9 for text, +1.5 for audio, and +12.7 for remote sensing.
Key takeaway
For research scientists developing generalized category discovery systems, OmniGCD demonstrates that training once on synthetic data can enable robust zero-shot performance across multiple modalities. You should consider adopting a decoupled approach where representation learning is separated from the category discovery process to enhance scalability and reduce dataset-specific fine-tuning requirements.
Key insights
OmniGCD enables zero-shot, modality-agnostic category discovery by decoupling representation learning from category discovery.
Principles
- Decouple representation learning from category discovery.
- Synthetic data training can enable zero-shot generalization.
Method
OmniGCD uses modality-specific encoders, dimension reduction to a GCD latent space, and a Transformer-based model for test-time representation transformation.
In practice
- Utilize strong encoders for diverse modalities.
- Explore synthetic data for model pre-training.
Topics
- OmniGCD
- Generalized Category Discovery
- Modality Agnosticism
- Zero-shot Learning
- GCD Latent Space
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.