Your model is probabilistic. Your system of record can’t be.

· Source: AI Advances - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, long

Summary

The article discusses the fundamental probabilistic nature of large language models (LLMs) and the critical distinction between their outputs and deterministic systems of record. It explains that an LLM's final layer uses a softmax function with a "temperature" knob (T) to sample tokens from a probability distribution, meaning the "most likely token is not the correct token." This inherent probabilism, even at T=0 (greedy decoding), provides repeatability but not guaranteed correctness, as the model still samples from a distribution of plausibility. The author argues that integrating LLMs into software without acknowledging this leads to a "category error," treating a sampler as a deterministic function. The core solution involves drawing a clear line between outputs where probabilism is acceptable (advisory, transient, bounded blast radius) and those requiring determinism (shared state, automation, auditable). For deterministic outputs, a "membrane" architecture is proposed, involving proposing, gating, pinning, deterministic execution, and accounting, ensuring the model only suggests, while a deterministic system decides.

Key takeaway

For AI Engineers and Architects integrating LLMs into production systems, you must explicitly differentiate between probabilistic model outputs and deterministic system-of-record requirements. Do not rely on temperature settings for correctness; instead, build a "membrane" around critical outputs. This involves having the model propose, then using deterministic gates to validate, pin, and execute, ensuring auditable and reproducible results, even if it adds latency and engineering overhead.

Key insights

The core challenge in integrating LLMs is managing their inherent probabilistic nature within deterministic software systems.

Principles

Method

The article proposes a "membrane" architecture for outputs requiring determinism: propose, gate, pin, execute deterministically, and account. This ensures the model suggests, but a deterministic system decides and records.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Advances - Medium.