CORE-BREW: LLR-Based Soft Decoding for Robust Multi-Bit LLM Watermarking

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, medium

Summary

CORE-BREW is a novel multi-bit watermarking technique designed to ensure reliable provenance for Large Language Model (LLM) outputs, maintaining robustness against editing while controlling false positives. Unlike existing ECC-based LLM watermarks that often discard token-level reliability information through hard-decision decoding, CORE-BREW employs a Constant-hit-Rate Embedding (extension of block-wise BREW) approach. It calibrates the watermark channel by targeting a fixed hit rate, p-star, which yields closed-form per-token log-likelihood ratios (LLRs) for principled soft-decision decoding. The system supports two detection modes: Strict-Safe, preserving a bounded-distance designated-codeword acceptance region, and FPR-Calibrated, which uses likelihood-based scoring and lightweight list decoding to characterize the false-positive rate (FPR) and true-positive rate (TPR) trade-off. Experiments on open-source LLMs demonstrate CORE-BREW's superior low-FPR discrimination and robustness against token-level edits and paraphrasing, all while preserving comparable semantic quality.

Key takeaway

For AI Security Engineers tasked with ensuring reliable provenance for LLM-generated content, you should evaluate CORE-BREW as a robust multi-bit watermarking solution. Its LLR-based soft-decision decoding significantly improves detection accuracy and resilience against common text edits and paraphrasing compared to prior methods. This approach allows for precise control over false-positive rates, crucial for maintaining trust and accountability in AI outputs. Consider integrating CORE-BREW to enhance the verifiability and integrity of your LLM applications.

Key insights

CORE-BREW employs LLR-based soft decoding for robust multi-bit LLM watermarking, enhancing detection and resilience to text modifications.

Principles

Method

CORE-BREW calibrates the watermark channel to a fixed hit rate "p-star" to derive per-token log-likelihood ratios (LLRs), enabling principled soft-decision decoding. It supports Strict-Safe and FPR-Calibrated detection modes.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, NLP Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.