Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Detector Evasion Policy Optimization (DEPO) is a new Lagrangian primal-dual reinforcement learning algorithm designed to enable LLM paraphrasing that evades AI-text detectors while precisely preserving semantic meaning. Existing methods often degrade fine-grained semantics or offer only indirect control over the evasion-semantics trade-off. DEPO addresses this by formulating detector-evasive paraphrasing as a Constrained Markov Decision Process, where detector evasion is the primary objective and semantic preservation is an explicit constraint. The algorithm incorporates a novel GRPO-style group-based policy update, allowing it to adaptively balance these objectives during training. Experiments conducted on MAGE, M4, RAID, and peer-review datasets, and evaluated against MAGE, RoBERTa, RADAR, Binoculars, and Fast-DetectGPT detectors, demonstrate that DEPO achieves strong detector evasion while precisely satisfying its semantic preservation constraint. Furthermore, DEPO exhibits robustness across different domains, detectors, and prompt levels.

Key takeaway

For Machine Learning Engineers developing LLM-based text generation or paraphrasing tools, DEPO offers a robust approach to circumvent AI-text detectors. You can achieve high detector evasion rates while ensuring the semantic integrity of your generated content, a critical balance often missed by other methods. Consider integrating constrained policy optimization techniques to build more resilient and semantically accurate LLM applications, especially when facing evolving detection mechanisms.

Key insights

DEPO enables LLM paraphrasing to evade AI-text detectors while explicitly constraining semantic preservation using a Constrained Markov Decision Process.

Principles

Method

DEPO uses a Lagrangian primal-dual reinforcement learning algorithm with a GRPO-style group-based policy update to optimize detector evasion within a prescribed semantic-preservation region.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.