GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

GEMS, a novel training-free method, enables multi-semantic superposition in large language models by addressing two independent sources of collapse: distributional deviation and directional interference. This approach utilizes geometric constraints, including norm-preserving weighted superposition, targeted o_proj attention-pathway injection, and real-time orthogonalization, along with a Gaussian envelope for inter-layer strength modulation. On the GSM8K benchmark, GEMS maintained 98% accuracy when injecting three concurrent non-mathematical directions, significantly outperforming unconstrained addition which collapsed to 4% (baseline 92%). For language modeling, the same injection on Wikitext-2 incurred only a 2.2% perplexity increase. The method demonstrates qualitative steering effects and transferability across models ranging from 3B to 31B parameters, including Llama-3.2-3B, Qwen3.6-27B, and Gemma-4-31B.

Key takeaway

For machine learning engineers developing LLM applications requiring nuanced, multi-attribute control, GEMS offers a robust, training-free solution. You should integrate its geometric constraints, like norm preservation and orthogonalization, to enable simultaneous steering of multiple semantic directions. This prevents model collapse and preserves core capabilities. Your team can achieve fine-grained behavioral control, balancing factual accuracy with communication style, with minimal overhead.

Key insights

Geometric constraints enable robust multi-directional activation steering in LLMs by preventing norm accumulation and directional interference.

Principles

Method

GEMS applies real-time Gram-Schmidt orthogonalization, norm-constrained weighted superposition at the attention output projection (o_proj), and a Gaussian envelope for inter-layer strength modulation during forward propagation.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.