When New Generators Arrive: Lifelong Machine-Generated Text Attribution via Ridge Feature Transfer
Summary
Machine-generated text (MGT) attribution aims to identify the specific generator responsible for a given text, crucial for model accountability and misuse investigation. As new large language models emerge, attribution models must continuously incorporate new generators while preserving recognition of previously seen ones. Addressing this, RidgeFT is proposed as a lightweight analytic update framework that avoids exemplar replay. It trains a task-aware encoder on an initial generator set, storing compact class-wise sufficient statistics. The encoder is then frozen for replay-free closed-form updates. RidgeFT suppresses generator-irrelevant variation through covariance calibration and improves representation capacity with fixed random features. New classes are updated via closed-form ridge regression based on class-level sufficient statistics. Across multi-topic evaluations with varying initial generator setups, RidgeFT consistently outperforms baselines, achieving the best macro-F1 score across domains, backbones, and incremental protocols, enhancing both old-class retention and new-class adaptation.
Key takeaway
For NLP Engineers or AI Scientists tasked with continuously updating machine-generated text attribution models, RidgeFT provides a compelling alternative to replay-based methods. You can achieve superior macro-F1 performance and better balance new generator adaptation with old class retention using its replay-free, analytic update framework. Consider integrating feature-stable analytic updates to simplify your lifelong learning pipelines and enhance model accountability.
Key insights
Lifelong MGT attribution can be effectively achieved through feature-stable analytic updates without exemplar replay.
Principles
- Lifelong learning requires balancing new adaptation with old retention.
- Feature-stable analytic updates simplify continuous model adaptation.
- Covariance calibration suppresses irrelevant feature variations.
Method
RidgeFT trains an encoder, stores class-wise sufficient statistics, freezes the encoder, then updates new classes via closed-form ridge regression, incorporating covariance calibration and fixed random features.
In practice
- Implement replay-free updates for MGT attribution.
- Use covariance calibration for feature stability.
- Apply fixed random features to boost representation.
Topics
- Machine-Generated Text Attribution
- Lifelong Learning
- Ridge Regression
- Covariance Calibration
- Large Language Models
- Incremental Learning
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.