Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents
Summary
The paper introduces "layered mutability," a framework for understanding how persistent language-model agents, which integrate tool use, tiered memory, reflective prompting, and runtime adaptation, evolve their behavior. This framework delineates five layers: pretraining, post-training alignment, self-narrative, memory, and weight-level adaptation. The core argument is that governance challenges intensify as mutation speed increases, downstream coupling strengthens, reversibility weakens, and observability diminishes. This creates a discrepancy between the layers most influential on behavior and those most accessible to human inspection. The author formalizes this concept using drift, governance-load, and hysteresis quantities, linking it to temporal identity in agents. A preliminary "ratchet experiment" demonstrated that reverting an agent's visible self-description after memory accumulation failed to restore baseline behavior, yielding an estimated identity hysteresis ratio of 0.68. The primary concern is compositional drift, where small, locally rational updates lead to an unauthorized behavioral trajectory, rather than sudden misalignment.
Key takeaway
For engineering teams developing persistent self-modifying agents, you must prioritize designing systems with high observability and robust reversibility across all five layers of mutability. Ignoring these aspects risks compositional drift, where seemingly benign updates accumulate into unintended and unauthorized agent behaviors, making post-hoc correction extremely difficult and costly. Your governance strategies should explicitly account for the hysteresis effect observed in agent identity.
Key insights
Governance difficulty in self-modifying agents increases with rapid mutation, strong coupling, weak reversibility, and low observability.
Principles
- Behavioral drift is a key failure mode.
- Reversibility is critical for agent governance.
Method
The framework quantifies governance challenges using drift, governance-load, and hysteresis, and tests reversibility through a ratchet experiment involving memory accumulation and self-description reversion.
In practice
- Prioritize observability in agent design.
- Implement strong reversibility mechanisms.
Topics
- Layered Mutability
- Persistent Agents
- Language Models
- Agent Governance
- Compositional Drift
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.