TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages

2026-05-31 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

TUKABENCH is a novel jailbreak benchmark designed to evaluate Large Language Model (LLM) safety across seven African languages, addressing the prevalent English-centric bias in current evaluations. This benchmark extends JailbreakBench (JBB) by incorporating four distinct prompt settings: direct human translation of JBB prompts, English prompts adapted to African contexts then human-translated, human-curated prompts validated via GPT-5.2 interactions, and code-switched prompts combining English and African languages. Evaluations across both closed and open models revealed that prompting in African languages generally reduces refusal rates compared to English, with culturally adapted prompts yielding the lowest refusal. The study also identified two critical structural limitations: LLM comprehension failures and diminished reliability of LLM-as-a-judge in Low-Resource Languages (LRLs). To address these, TUKABENCH introduces "Deflection" as a new metric alongside "Refused" and "Jailbroken," and validates judge outputs with human annotations, demonstrating a decrease in judge-human agreement for LRLs and less commonly supported scripts.

Key takeaway

For NLP Engineers or AI Security Engineers deploying LLMs globally, especially in African language markets, you should recognize that direct English-centric safety benchmarks are insufficient. Your evaluations must incorporate culturally grounded and language-specific prompts, as these significantly alter refusal rates and expose unique comprehension failures. Prioritize human validation for LLM-as-a-judge outputs in Low-Resource Languages to ensure accurate safety assessments, and consider the "Deflection" metric to identify subtle model failures beyond simple refusal.

Key insights

African language prompting reduces LLM refusal, but reveals comprehension and judge reliability issues in low-resource contexts.

Principles

LLM safety evaluations are heavily English-centric.
Culturally adapted prompts significantly impact LLM refusal rates.
LLM-as-a-judge reliability decreases in Low-Resource Languages.

Method

TUKABENCH extends JailbreakBench with four prompt settings: human translation, culturally adapted English then translated, human-curated (GPT-5.2 validated), and code-switched prompts. It introduces "Deflection" to capture comprehension failures and uses human annotations to validate judge reliability.

In practice

Employ culturally adapted prompts for LLM safety evaluations in LRLs.
Integrate human validation for LLM-as-a-judge outputs in LRLs.
Monitor "Deflection" as a metric for LLM comprehension failures.

Topics

LLM Safety Evaluation
African Languages
Jailbreak Benchmarks
Low-Resource Languages
Prompt Engineering
Model Comprehension

Best for: Research Scientist, AI Scientist, NLP Engineer, AI Security Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.