CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Advanced, quick

Summary

The CLASP model is introduced as a defense mechanism against Hidden State Poisoning Attacks (HiSPAs) in State Space Models (SSMs) like Mamba and their hybrid variants. HiSPAs are a recently identified vulnerability that corrupts SSM memory via adversarial strings. CLASP frames HiSPA mitigation as a token-level binary classification problem, leveraging distinct patterns in Mamba's block output embeddings (BOEs) and an XGBoost classifier to detect malicious tokens. In a realistic scenario of an LLM screening résumés, CLASP achieved a 95.9% token-level F1 score and 99.3% document-level F1 score on a corpus of 2,483 résumés (9.5M tokens). The model demonstrates strong generalization, maintaining 96.9% document-level F1 under leave-one-out cross-validation and 91.6% under clustered cross-validation with novel triggers. CLASP operates independently, processing 1,032 tokens per second with under 4GB VRAM, making it suitable for real-world deployment.

Key takeaway

For AI Scientists and CTOs deploying Mamba-based or hybrid LLMs, integrating CLASP offers a robust, lightweight defense against Hidden State Poisoning Attacks. Its high detection accuracy and generalization capabilities, coupled with low computational overhead, make it a practical solution for securing applications like résumé screening, mitigating critical vulnerabilities before they impact downstream models.

Key insights

CLASP defends Mamba-based LLMs from hidden state poisoning by classifying malicious tokens via block output embeddings.

Principles

Method

CLASP uses an XGBoost classifier on Mamba's block output embeddings (BOEs) to identify malicious tokens, treating HiSPA mitigation as a binary classification task.

In practice

Topics

Best for: AI Scientist, Research Scientist, CTO, AI Researcher, AI Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.