Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

OpAI-Bench is a new operation-guided benchmark for multi-granularity AI-text detection, addressing limitations of existing benchmarks that focus on final outputs. It constructs nine sequentially revised versions for each human-written document, simulating progressive human-to-AI co-editing across four domains: student essays, news articles, government reports, and scientific abstracts. The benchmark uses five AI edit operations (polish, paraphrase, style rewrite, compress, and expand) while preserving authorship provenance at document, sentence, token, and span granularities. Experiments with 8 document-level, 7 sentence-level, and 2 fine-grained detectors reveal that AI-text detectability is non-monotonic, influenced by edit operation, domain, and cumulative revision history, not solely by the proportion of AI-edited content. Mixed-authorship intermediate versions, particularly around v4 with compression, are often harder to detect than fully human or heavily AI-edited texts.

Key takeaway

For Machine Learning Engineers developing AI-text detection systems, you should move beyond binary endpoint classification and incorporate trajectory-aware and operation-aware evaluation. Your models must account for non-monotonic detectability, especially for mixed-authorship content and specific edit operations like compression, which can significantly reduce detectability. This approach will lead to more robust and reliable detection tools for real-world human-AI co-editing workflows.

Key insights

AI-text detectability is non-monotonic, influenced by edit operations and revision history, not just AI content proportion.

Principles

Method

OpAI-Bench constructs nine versions per human document, progressively editing the previous version using five AI operations (polish, paraphrase, style rewrite, compress, expand) at increasing AI coverage, preserving multi-granularity provenance.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.