A Linguistics-Aware LLM Watermarking via Syntactic Predictability

2025-10-10 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

STELA is a new watermarking framework for large language models (LLMs) designed to balance text quality with detection robustness, addressing the challenge of public verifiability. Unlike previous methods that rely on model-specific output distributions and require access to logits for detection, STELA aligns watermark strength with the linguistic degrees of freedom in text. It dynamically adjusts the watermark signal using part-of-speech (POS) n-gram-modeled linguistic indeterminacy, reducing it in grammatically constrained contexts to maintain text quality and increasing it in flexible contexts to improve detectability. This approach enables publicly verifiable detection without needing access to the underlying model's logits. Extensive experiments across English, Chinese, and Korean demonstrate that STELA outperforms prior watermarking techniques in detection robustness.

Key takeaway

For research scientists developing LLM governance tools, STELA offers a robust, publicly verifiable watermarking solution. You should consider integrating linguistics-aware methods like STELA to overcome the limitations of logit-dependent detection, enhancing the trustworthiness and transparency of AI-generated content. This approach improves detection robustness across diverse languages while preserving text quality.

Key insights

STELA offers a linguistics-aware LLM watermarking method for publicly verifiable detection without model logits.

Principles

Balance text quality and detection robustness.
Align watermark strength with linguistic flexibility.

Method

STELA modulates watermark signal strength using part-of-speech (POS) n-gram-modeled linguistic indeterminacy, weakening it in grammatically constrained contexts and strengthening it in flexible contexts.

In practice

Apply STELA for publicly verifiable LLM content attribution.
Use POS n-grams to model linguistic indeterminacy.

Topics

LLM Watermarking
Syntactic Predictability
Linguistic Indeterminacy
Part-of-Speech n-grams
Public Verifiability

Code references

Shinwoo-Park/stela_watermark

Best for: Research Scientist, AI Scientist, NLP Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.