CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters
Summary
CuMA (Cultural Mixture of Adapters) is a novel framework designed to align Large Language Models (LLMs) with diverse cultural values, addressing the "Mean Collapse" issue observed in dense models. Dense models, when forced to fit conflicting value distributions, converge to a generic average, failing to represent diverse groups due to "Cultural Sparsity" and gradient interference. CuMA frames alignment as a conditional capacity separation problem, employing demographic-aware routing to disentangle conflicting gradients into specialized expert subspaces. It learns a Latent Cultural Topology by conditioning expert selection on both semantic content and user demographic profiles. Evaluated on WorldValuesBench, Community Alignment, and PRISM using Llama-3.1-8B-Instruct and Qwen3-8B backbones, CuMA achieves leading performance. It significantly reduces distributional divergence (EMD to 0.1876), outperforms dense baselines by over 5% in accuracy, and achieves dominant Win-Rates (78.2% on CA, 76.8% on PRISM), effectively mitigating mean collapse and preserving cultural diversity.
Key takeaway
For Machine Learning Engineers building global LLMs, you should move beyond monolithic alignment approaches. Your models are likely suffering from "Mean Collapse," producing generic, Western-centric responses. Implement a framework like CuMA, utilizing demographic-aware routing and specialized adapters, to explicitly disentangle conflicting cultural values. This will enable your LLMs to generate culturally resonant outputs, improving accuracy and diversity, as demonstrated by CuMA's superior EMD and Win-Rates on benchmarks like PRISM.
Key insights
CuMA uses demographic-aware routing and specialized adapters to prevent LLMs from collapsing diverse cultural values into a generic average.
Principles
- Human values exhibit Cultural Sparsity.
- Dense models suffer gradient interference.
- Capacity separation prevents Mean Collapse.
Method
CuMA linearizes demographic profiles into embeddings, concatenates them with semantic hidden states, and uses a router to select Top-k LoRA experts. This isolates conflicting cultural gradients.
In practice
- Implement demographic-aware routing for pluralistic LLM alignment.
- Use LoRA adapters as specialized cultural experts.
- Evaluate alignment with EMD and Distinct-2 scores.
Topics
- Cultural Alignment
- Mixture of Adapters
- Demographic-Aware Routing
- Mean Collapse
- LoRA
- Pluralistic LLMs
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.