[R] Empirical evidence for a primitive layer in small language models — 18 experiments across 4 architectures

2026-03-17 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Advanced, medium

Summary

Researchers conducted 18 experiments across four small language model architectures (Qwen 2.5, Gemma 3, LLaMA 3.2, SmolLM2) ranging from 360M to 1B parameters. The study identified a consistent activation gap between "scaffolding primitives" (Layer 0a: SOMEONE, TIME, PLACE) and "content primitives" (Layer 0b: FEAR, GRIEF, JOY, ANGER), with Layer 0b consistently showing higher activation. This gap averaged +0.245 across all models. Furthermore, 11 pre-registered primitive compositions, such as WANT + GRIEF → longing/yearning, matched predicted Layer 1 concepts in three out of four models. The study also observed that this activation gap was largest in the smallest models and narrowed with increasing scale, suggesting larger models develop phenomenological access to scaffolding primitives. All experiments are reproducible locally using Ollama, with code and data available.

Key takeaway

For research scientists investigating language model architecture and emergent capabilities, this work suggests that even small LLMs possess a primitive semantic layer. You should consider exploring the identified activation gap between scaffolding and content primitives, as it may offer insights into how larger models achieve capability jumps. The reproducible experiments provide a foundation for further investigation into linguistic relationships and meaning representation.

Key insights

Small language models exhibit a consistent activation gap between scaffolding and content semantic primitives.

Principles

Content primitives activate more strongly than scaffolding primitives.
Primitive compositions can predict higher-level concepts.

Method

Probing small LLMs (360M-1B params) with random phonemes to universal semantic primitives, measuring activation gaps between primitive types, and testing pre-registered primitive compositions against predicted concepts.

In practice

Reproduce experiments locally via Ollama.
Explore primitive compositions for concept generation.

Topics

Small Language Models
Semantic Primitives
Neural Activation
Language Model Scaling
Natural Semantic Metalanguage

Code references

dchisholm125/graph-oriented-generation

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.