CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks
Summary
The CLASP model is introduced as a defense mechanism against Hidden State Poisoning Attacks (HiSPAs) in State Space Models (SSMs) like Mamba and their hybrid variants. HiSPAs are a recently identified vulnerability that corrupts SSM memory via adversarial strings. CLASP frames HiSPA mitigation as a token-level binary classification problem, leveraging distinct patterns in Mamba's block output embeddings (BOEs) and an XGBoost classifier to detect malicious tokens. In a realistic scenario of an LLM screening résumés, CLASP achieved a 95.9% token-level F1 score and 99.3% document-level F1 score on a corpus of 2,483 résumés (9.5M tokens). The model demonstrates strong generalization, maintaining 96.9% document-level F1 under leave-one-out cross-validation and 91.6% under clustered cross-validation with novel triggers. CLASP operates independently, processing 1,032 tokens per second with under 4GB VRAM, making it suitable for real-world deployment.
Key takeaway
For AI Scientists and CTOs deploying Mamba-based or hybrid LLMs, integrating CLASP offers a robust, lightweight defense against Hidden State Poisoning Attacks. Its high detection accuracy and generalization capabilities, coupled with low computational overhead, make it a practical solution for securing applications like résumé screening, mitigating critical vulnerabilities before they impact downstream models.
Key insights
CLASP defends Mamba-based LLMs from hidden state poisoning by classifying malicious tokens via block output embeddings.
Principles
- SSMs are vulnerable to hidden state poisoning.
- Attack patterns can be detected via BOE analysis.
Method
CLASP uses an XGBoost classifier on Mamba's block output embeddings (BOEs) to identify malicious tokens, treating HiSPA mitigation as a binary classification task.
In practice
- Deploy CLASP as a front-line defense.
- Use XGBoost for lightweight anomaly detection.
Topics
- State Space Models
- Hidden State Poisoning Attacks
- LLM Security
- Mamba Architecture
- XGBoost Classifier
Best for: AI Scientist, Research Scientist, CTO, AI Researcher, AI Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.