Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

2026-06-04 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

OpAI-Bench is a new operation-guided benchmark designed to study progressive human-to-AI text transformation for multi-granularity AI-text detection. It addresses the gap in existing benchmarks by focusing on co-editing workflows rather than just final outputs. Starting from human-written documents, OpAI-Bench constructs nine sequentially revised versions for each sample, incorporating predefined AI coverage levels and five representative AI edit operations across four domains, while preserving complete authorship provenance. The benchmark supports comprehensive evaluation using 8 document-level, 7 sentence-level, and 2 fine-grained token/span-level detectors. Experiments reveal that AI-text detectability is influenced by the proportion of AI-edited content, edit operation, domain, and cumulative revision history. Notably, mixed-authorship intermediate versions are often harder to detect than fully human or heavily AI-edited endpoints, indicating non-monotonic detection patterns.

Key takeaway

For AI Scientists and Machine Learning Engineers developing AI-text detectors, you should account for the complexities of human-AI co-editing workflows. Your evaluation metrics must consider that detectability is not solely proportional to AI-generated content, as mixed-authorship intermediate versions can be significantly harder to identify. Integrate progressive transformation benchmarks like OpAI-Bench into your testing to reveal non-monotonic detection patterns and improve detector robustness against realistic revision scenarios.

Key insights

AI-text detectability is non-monotonic, influenced by edit operations and revision history, not just AI content proportion.

Principles

AI authorship signals evolve progressively.
Detectability depends on edit operation and domain.
Mixed-authorship texts are harder to detect.

Method

OpAI-Bench constructs nine sequential revisions per sample from human text, using predefined AI coverage and five AI edit operations across four domains, preserving multi-granularity authorship.

In practice

Evaluate detectors on progressive revisions.
Test across diverse AI edit operations.
Account for non-monotonic detection.

Topics

AI-text Detection
Human-AI Co-editing
Text Transformation
Benchmark Development
Authorship Analysis
Multi-Granularity Detection

Code references

VILA-Lab/OpAI-Bench

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.