Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization
Summary
Detector Evasion Policy Optimization (DEPO) is a new Lagrangian primal-dual reinforcement learning algorithm designed to enable LLM paraphrasing that evades AI-text detectors while precisely preserving semantic meaning. Existing methods often degrade fine-grained semantics or offer only indirect control over the evasion-semantics trade-off. DEPO addresses this by formulating detector-evasive paraphrasing as a Constrained Markov Decision Process, where detector evasion is the primary objective and semantic preservation is an explicit constraint. The algorithm incorporates a novel GRPO-style group-based policy update, allowing it to adaptively balance these objectives during training. Experiments conducted on MAGE, M4, RAID, and peer-review datasets, and evaluated against MAGE, RoBERTa, RADAR, Binoculars, and Fast-DetectGPT detectors, demonstrate that DEPO achieves strong detector evasion while precisely satisfying its semantic preservation constraint. Furthermore, DEPO exhibits robustness across different domains, detectors, and prompt levels.
Key takeaway
For Machine Learning Engineers developing LLM-based text generation or paraphrasing tools, DEPO offers a robust approach to circumvent AI-text detectors. You can achieve high detector evasion rates while ensuring the semantic integrity of your generated content, a critical balance often missed by other methods. Consider integrating constrained policy optimization techniques to build more resilient and semantically accurate LLM applications, especially when facing evolving detection mechanisms.
Key insights
DEPO enables LLM paraphrasing to evade AI-text detectors while explicitly constraining semantic preservation using a Constrained Markov Decision Process.
Principles
- Formulate evasion as primary objective.
- Enforce semantic preservation as explicit constraint.
- Balance objectives adaptively during training.
Method
DEPO uses a Lagrangian primal-dual reinforcement learning algorithm with a GRPO-style group-based policy update to optimize detector evasion within a prescribed semantic-preservation region.
In practice
- Achieve strong detector evasion.
- Ensure precise semantic preservation.
- Maintain cross-domain robustness.
Topics
- LLM Paraphrasing
- AI-text Detectors
- Constrained Policy Optimization
- Reinforcement Learning
- Semantic Preservation
- Policy Optimization
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.