ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety
Summary
ROK-FORTRESS is a new bilingual, culturally adversarial benchmark designed to evaluate Large Language Models (LLMs) for National Security and Public Safety (NSPS) risks, specifically focusing on the English-Korean language pair and the U.S.-ROK geopolitical axis. The benchmark, comprising 1,235 tasks, uses a "transcreation matrix" to isolate the effects of language and geopolitical context on LLM safety behavior. It evaluates adversarial prompts under controlled combinations of English vs. Korean language and U.S. vs. Korean entities, institutions, and operational details. Each adversarial prompt is paired with a benign counterpart to measure over-refusal, and responses are scored using calibrated LLM-as-a-judge panels with expert-crafted binary rubrics and Tier-Weighted Risk Scores (TRS). Experiments across 14 frontier and Korean-optimized models reveal a consistent suppression effect in Korean variants and significant model-to-model variation in how geopolitical grounding interacts with language, often mitigating language-driven suppression.
Key takeaway
For NLP Engineers and Research Scientists developing or deploying LLMs in high-stakes global contexts, you should move beyond translation-only safety evaluations. Incorporate culturally adversarial benchmarks like ROK-FORTRESS that account for geopolitical grounding. This approach will help you identify nuanced safety failures and improve model alignment for diverse linguistic and cultural environments, reducing dual-use misuse risks and ensuring more equitable safety for non-English users.
Key insights
Multilingual LLM safety evaluations must consider geopolitical transcreation, not just translation, to accurately assess real-world risks.
Principles
- Translation-only evaluations can misestimate real-world safety.
- Language and geopolitical context jointly shape safety behavior.
- Korean language can act as a conservative risk signal.
Method
The "transcreation matrix" methodology systematically varies language and cultural grounding to disentangle linguistic effects from contextual/geopolitical grounding effects in LLM safety evaluations, using adversarial-benign prompt pairs and tier-weighted risk scoring.
In practice
- Use transcreated safety data for post-training alignment.
- Implement culturally grounded red-teaming.
- Evaluate LLMs with context-aware benchmarks.
Topics
- LLM Safety Evaluation
- Geopolitical Transcreation
- National Security
- Public Safety
- English-Korean Language Pair
Best for: Research Scientist, NLP Engineer, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.