ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety

2026-05-15 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

ROK-FORTRESS is a new bilingual, culturally adversarial benchmark designed to evaluate Large Language Models (LLMs) for National Security and Public Safety (NSPS) risks, specifically focusing on the English-Korean language pair and the U.S.-ROK geopolitical axis. The benchmark, comprising 1,235 tasks, uses a "transcreation matrix" to isolate the effects of language and geopolitical context on LLM safety behavior. It evaluates adversarial prompts under controlled combinations of English vs. Korean language and U.S. vs. Korean entities, institutions, and operational details. Each adversarial prompt is paired with a benign counterpart to measure over-refusal, and responses are scored using calibrated LLM-as-a-judge panels with expert-crafted binary rubrics and Tier-Weighted Risk Scores (TRS). Experiments across 14 frontier and Korean-optimized models reveal a consistent suppression effect in Korean variants and significant model-to-model variation in how geopolitical grounding interacts with language, often mitigating language-driven suppression.

Key takeaway

For NLP Engineers and Research Scientists developing or deploying LLMs in high-stakes global contexts, you should move beyond translation-only safety evaluations. Incorporate culturally adversarial benchmarks like ROK-FORTRESS that account for geopolitical grounding. This approach will help you identify nuanced safety failures and improve model alignment for diverse linguistic and cultural environments, reducing dual-use misuse risks and ensuring more equitable safety for non-English users.

Key insights

Multilingual LLM safety evaluations must consider geopolitical transcreation, not just translation, to accurately assess real-world risks.

Principles

Translation-only evaluations can misestimate real-world safety.
Language and geopolitical context jointly shape safety behavior.
Korean language can act as a conservative risk signal.

Method

The "transcreation matrix" methodology systematically varies language and cultural grounding to disentangle linguistic effects from contextual/geopolitical grounding effects in LLM safety evaluations, using adversarial-benign prompt pairs and tier-weighted risk scoring.

In practice

Use transcreated safety data for post-training alignment.
Implement culturally grounded red-teaming.
Evaluate LLMs with context-aware benchmarks.

Topics

LLM Safety Evaluation
Geopolitical Transcreation
National Security
Public Safety
English-Korean Language Pair

Best for: Research Scientist, NLP Engineer, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.