Generalised Eigenvalue Geometry of Semantic Adversarial Attacks

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Data Science & Analytics · Depth: Expert, extended

Summary

This paper introduces a theoretical framework for semantic adversarial attacks, focusing on a two-model threat model involving proxy and target embeddings. It defines a "local attackability index" (λ*(x)) as the top generalized eigenvalue of a matrix pencil formed by the Jacobians of the two embedders. This index quantifies the worst-case local displacement of the target representation under a proxy budget. The research derives a closed-form prediction-flip condition for affine readouts and develops population-level attackability measures with uniform concentration bounds using VC dimension and fat-shattering margin theory. The study also bridges this continuous theory to discrete paraphrase searches. Empirical verification using FinBERT and Sentence-BERT on the Financial PhraseBank confirms the theoretical inequality Σw(x) ≤ λ*(x) and shows the attackability-adjusted margin Z_w(x) predicts vulnerability with an AUC of approximately 0.91.

Key takeaway

For AI Security Engineers evaluating NLP model robustness, you should diagnose semantic attackability using the proposed local attackability index λ*(x) and the adjusted margin Z_w(x). These metrics, derived from embedding Jacobians, predict vulnerability with high accuracy (AUC ≈ 0.91), offering a more robust assessment than traditional finite-search methods alone. Prioritize Z_w(x) for specific readout vulnerability, as λ*(x) can be overly conservative.

Key insights

Semantic adversarial vulnerability is governed by the relative local geometry of proxy and target embedding models.

Principles

Method

A continuous local model of paraphrase perturbations uses Jacobians of proxy and target embedders to form a matrix pencil, whose top generalized eigenvalue defines the attackability index.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.