The Shape of Wisdom: Decision Trajectories in Language Models

2026-05-31 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A study on language model decision trajectories, titled "The Shape of Wisdom," analyzed 9,000 trajectories across Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, and Mistral-7B-Instruct-v0.3 models using the MMLU benchmark. The research describes each trajectory by its current answer margin, the next-layer change in that margin, and the distance from a decision flip. A key finding is that correctness and stability are distinct properties, with the largest group of answers being "unstable-correct." Further analysis on a traced subset revealed that in stable-correct cases, the average attention scalar points in the correct direction, while the average MLP scalar does not. Span deletion experiments showed that removing answer-supporting text hurts the margin, and removing distractor-like text helps it. This work provides a reproducible method to identify settled versus fragile answers and their influencing sources.

Key takeaway

For machine learning engineers evaluating language model robustness, understanding decision trajectories reveals which answers are fragile despite being correct. You should analyze these trajectories to identify and mitigate potential failure points in critical applications, especially when deploying models like Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, or Mistral-7B-Instruct-v0.3. This approach helps ensure model reliability beyond simple accuracy metrics.

Key insights

Correctness and stability are distinct properties of language model decision trajectories.

Principles

LM answers evolve across layers, not just at output.
Unstable-correct answers form the largest group.
Attention scalars align with correct answers; MLP scalars do not.

Method

Describing trajectories with answer margin, next-layer margin change, and distance from decision flip, then tracing subsets to analyze scalar contributions and span deletion effects.

In practice

Identify fragile LM answers before deployment.
Pinpoint text segments influencing LM decisions.

Topics

Language Models
Decision Trajectories
MMLU Benchmark
Model Stability
Attention Mechanisms
MLP Layers
Model Interpretability

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.