Korean Culture into LLM Alignment: Toward Cultural Coherence
Summary
A new alignment-data pipeline addresses the challenge of integrating Korean culture into large language models (LLMs), moving beyond negative output suppression to define culturally coherent responses. This pipeline, instantiated for Korean, utilizes a prompt-based LLM seed generator to expand a Korean harm taxonomy. At its core is a Korean-culturally-adapted safe-response policy, grounded in Korean legal frameworks, social norms, and interpretive conventions. DPO fine-tuning on 10,000 resulting triplets significantly improved the Korean cultural safe rate across six open-weight LLMs, including A.X-4.0-Light, EXAONE-3.5, Kanana-1.5, Qwen-2.5, Gemma-3, and Llama-3.1, with an average gain of +6.59 points on Korset. Crucially, this enhancement caused no large degradation on Korean general-capability benchmarks, and fine-tuned models provided constructive, context-specific information, often citing Korean statutes.
Key takeaway
For AI Scientists and Machine Learning Engineers deploying LLMs in specific cultural regions, prioritizing local cultural coherence is critical. Your current global alignment data may be insufficient, leading to culturally inappropriate or over-refusal responses. You should adopt a constructive, locale-specific alignment pipeline that integrates local legal frameworks, social norms, and interpretive conventions. This approach, demonstrated for Korean, enhances cultural safety and helpfulness without compromising general capabilities, ensuring your models are genuinely useful and trusted in target markets.
Key insights
LLM cultural alignment requires constructive, locale-specific definitions of coherence, moving beyond mere output suppression.
Principles
- Cultural alignment must define what is coherent, not just what to avoid.
- Queries and responses should be tightly grounded in the target culture.
- Refusals must be grounded in local norms, avoiding superficiality.
Method
A pipeline generates DPO triplets by expanding a Korean harm taxonomy, using an attacker LLM for hard-case mining, and a multi-model safe-response generator conditioned on a Korean-adapted policy, filtered by a three-judge ensemble.
In practice
- Develop harm taxonomies anchored in local legal/social contexts.
- Ensure responses name applicable local statutes and norms.
- Offer constructive, context-relevant alternatives alongside refusals.
Topics
- LLM Alignment
- Cultural Coherence
- Korean NLP
- DPO Fine-tuning
- AI Safety
- Harm Taxonomy
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.