SenFlow: Inter-Sentence Flow Modeling for AI-Generated Text Detection in Hybrid Documents
Summary
SenFlow is a novel method designed for sentence-level AI-generated text detection (S-AGTD) in hybrid documents, where human and LLM content co-exist. It addresses limitations of prior approaches that classify sentences in isolation and use outdated benchmarks. To facilitate this, the authors constructed MOSAIC, a new benchmark comprising 16,000 hybrid documents derived from PubMed and XSum, generated by DeepSeek-V3.2 and Kimi K2. MOSAIC incorporates a stringent perplexity-consistency filter, a feature absent in previous benchmarks. SenFlow recasts S-AGTD as a structured prediction problem over document sentence sequences, integrating graph-based inter-sentence propagation with linear-chain CRF decoding in a single document-level pass. This approach achieves state-of-the-art performance on MOSAIC, demonstrating a +4.15 pp average Macro-F1 margin on cross-domain transfer, the most challenging protocol. The research also found that even with perplexity filtering, AI insertions maintain a generator-dependent sentence-length gap.
Key takeaway
For NLP engineers developing AI-generated text detection systems, recognize that isolated sentence analysis is insufficient for hybrid documents. Your models should incorporate inter-sentence dependencies, as demonstrated by SenFlow's structured prediction approach, to achieve higher accuracy. Consider evaluating your detectors against the MOSAIC benchmark, which includes recent LLM outputs and a perplexity-consistency filter, to ensure robustness against modern generators.
Key insights
SenFlow improves AI-generated text detection in hybrid documents by modeling inter-sentence dependencies and using a new benchmark.
Principles
- Inter-sentence dependencies are crucial for S-AGTD.
- Structured prediction enhances sentence-level detection.
- Perplexity filters don't eliminate all AI cues.
Method
SenFlow recasts S-AGTD as structured prediction, integrating graph-based inter-sentence propagation with linear-chain CRF decoding in a single document-level pass over a sentence graph.
In practice
- Use MOSAIC benchmark for S-AGTD evaluation.
- Consider inter-sentence context for detection.
- Analyze sentence length as an AI insertion cue.
Topics
- AI-Generated Text Detection
- Hybrid Documents
- Structured Prediction
- Inter-Sentence Dependencies
- MOSAIC Benchmark
- Large Language Models
Code references
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.