When New Generators Arrive: Lifelong Machine-Generated Text Attribution via Ridge Feature Transfer

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

RidgeFT is a lightweight analytic update framework designed for lifelong Machine-Generated Text (MGT) attribution, addressing the challenge of continuously incorporating new text generators while preserving recognition of previously seen ones. This framework trains a task-aware encoder on an initial generator set, then freezes it, storing only compact class-wise sufficient statistics. It employs covariance calibration to suppress generator-irrelevant variations, uses fixed random features to enhance representation capacity, and updates new classes via closed-form ridge regression without requiring exemplar replay. Across multi-topic evaluations using P3, P4, and P5 protocols on MGT-Academic and AIGTBench datasets, RidgeFT consistently outperforms baselines. It achieves a 0.886 full-F1, 0.902 old-class F1, and 0.804 new-class F1 under the P5 protocol, improving full-F1 by 0.037 and new-class F1 by 0.107 over the strongest baselines, demonstrating superior data efficiency.

Key takeaway

For AI Scientists or Machine Learning Engineers developing machine-generated text attribution systems, RidgeFT offers a robust and efficient solution for adapting to continuously emerging large language models. You should consider integrating its replay-free analytic update framework to maintain high accuracy on both old and new generators, especially under low-resource conditions. This approach avoids catastrophic forgetting and costly full model retraining, allowing your systems to evolve scalably. Be mindful of its statistical storage requirements, though low-precision and merged statistics can significantly reduce this footprint.

Key insights

Lifelong MGT attribution benefits from frozen encoders and analytic updates, balancing new-class adaptation with old-class retention without replay.

Principles

Decouple new generator learning from deep representation updates.
Stable representations enable efficient incremental knowledge absorption.
Analytic updates via sufficient statistics avoid catastrophic forgetting.

Method

RidgeFT trains an encoder, freezes it, then uses covariance calibration, isotropic random feature lifting, and class-balanced ridge regression with sufficient statistics for replay-free, closed-form incremental updates.

In practice

Employ bf16 precision for class statistics to reduce storage.
Use merged bf16 statistics for new-class-only incremental updates.
Consider per-class bf16 for updates adding samples to old classes.

Topics

Machine-Generated Text Attribution
Lifelong Learning
Continual Learning
Ridge Regression
Random Features
Large Language Models

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.