SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, medium

Summary

SlotGCG is a novel jailbreak attack method that exploits positional vulnerabilities in Large Language Models (LLMs) by systematically identifying and targeting "slots" for adversarial token insertion. Unlike traditional Greedy Coordinate Gradient (GCG) attacks, which typically append adversarial tokens to the prompt suffix, SlotGCG introduces a position-search mechanism. This approach uses a Vulnerable Slot Score (VSS) to quantify the susceptibility of specific token positions, selecting the most vulnerable slots for a targeted optimization attack. SlotGCG achieves a 14% higher Attack Success Rate (ASR) over GCG-based attacks, converges faster, and demonstrates superior robustness against defense methods with a 42% higher ASR than baseline approaches. The method adds only 200ms of preprocessing time and is attack-agnostic, making it pluggable into other optimization-based attacks.

Key takeaway

For AI Security Engineers developing LLM defense mechanisms, you should prioritize understanding and mitigating positional vulnerabilities. Traditional suffix-based defenses are insufficient, as attacks like SlotGCG demonstrate that adversarial tokens inserted at diverse "slots" significantly increase attack success rates and bypass input filtering. Your red teaming efforts must expand beyond fixed insertion points to systematically explore and protect against these flexible attack strategies, potentially by integrating VSS-like metrics into your vulnerability assessments.

Key insights

LLM jailbreaking vulnerability is highly dependent on adversarial token insertion position, not just the tokens themselves.

Principles

Positional vulnerability varies across prompts.
Vulnerable slots correlate with attention patterns.
Attack effectiveness is position-driven.

Method

SlotGCG evaluates all prompt "slots" using a Vulnerable Slot Score (VSS), selects the highest-scoring slots, then runs a targeted optimization attack at those positions.

In practice

Implement VSS to identify LLM weak points.
Integrate SlotGCG into existing red teaming.
Diversify adversarial token insertion points.

Topics

LLM Jailbreaking
Adversarial Attacks
Greedy Coordinate Gradient
Positional Vulnerability
Red Teaming
AI Safety

Code references

youai058/SlotGCG

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.