Deterministic AI at Work: The LLM Is the Most Unreliable Function in My Pipeline, So I Treat It…
Summary
This article advocates for treating large language models (LLMs) as inherently unreliable functions within a larger, deterministic software pipeline. It proposes a three-layer architecture to ensure consistent and accurate outputs for critical tasks, contrasting with the inconsistent results often seen in direct chat interactions. The first layer involves deterministic code preparing precise inputs, such as parsing ISO 20022 message definitions or Avro schemas into structured lists. The second layer assigns the LLM a narrow, "fuzzy" task, like generating plain-English descriptions from validated facts, where variation is acceptable. The final layer employs deterministic code to rigorously validate the LLM's output against predefined schemas or rules, rejecting any malformed or hallucinated elements. This method ensures the overall system's reliability and auditability, applying established software engineering principles to manage a new, articulate, yet unpredictable dependency.
Key takeaway
For AI Engineers building production systems with LLMs, stop trying to make the model deterministic. Instead, you should wrap LLMs in a three-layer deterministic code pipeline. This approach ensures critical outputs are consistent and auditable by having code prepare inputs, validate outputs, and confine the LLM to genuinely fuzzy tasks. This prevents shipping inconsistent or hallucinated results, transforming an unreliable dependency into a controlled, reliable component of your system.
Key insights
Treat LLMs as unreliable functions within a deterministic code pipeline to achieve reliable, auditable system outputs.
Principles
- LLMs are probability engines, not deterministic.
- Code must handle all exact, critical tasks.
- Validate all LLM outputs mechanically.
Method
Implement a three-layer pipeline: deterministic code prepares precise inputs, the LLM performs narrow, fuzzy tasks, and deterministic code rigorously validates the LLM's output before acceptance.
In practice
- Parse schemas with code, not LLMs.
- Constrain LLM tasks to human language generation.
- Use code to verify LLM-generated field mappings.
Topics
- Large Language Models
- Deterministic AI
- Software Pipelines
- Output Validation
- System Reliability
- Prompt Engineering
Best for: AI Engineer, Machine Learning Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by LLM on Medium.