Keep Deterministic Work Deterministic
Summary
An analysis of an LLM-driven blackjack simulation reveals the "March of Nines" problem, where achieving higher reliability (e.g., 90% to 99%) requires disproportionate engineering effort due to cascading failures. Initial runs of the simulation, where an LLM played hands against plain English strategies, had a 37% pass rate. The author demonstrates this compounding error with an 8-prompt exercise using ChatGPT 5.3 Instant, where a single early miscalculation can derail an entire sequence, leading to incorrect final scores. The article emphasizes that LLMs struggle with deterministic tasks like character counting within tokens, making them susceptible to cascading failures in multi-step pipelines. The author iterated through eight versions of the blackjack pipeline, improving the pass rate from 31% to 94% by making deterministic work (like card dealing and strategy validation) explicit code and applying structural constraints (like Chain of Thought and rigid output formats) to LLM calls.
Key takeaway
For AI Engineers building multi-step LLM workflows, prioritize identifying and offloading deterministic tasks to conventional code. Your pipelines will achieve significantly higher reliability and reduce debugging complexity by making steps like data validation or rule application deterministic. This approach, exemplified by replacing an LLM validator with a 10-line script for a 31% pass rate jump, is more effective than extensive prompt engineering for tasks that don't require LLM judgment.
Key insights
Deterministic tasks in LLM pipelines should be handled by code to prevent cascading failures and improve reliability.
Principles
- Each "nine" of reliability costs as much as the last.
- Deterministic work should be handled by deterministic code.
- Cascading failures are inherent in chained LLM operations.
Method
Improve LLM pipeline reliability by identifying and replacing deterministic LLM steps with code, and applying structural constraints like Chain of Thought for remaining LLM-dependent steps.
In practice
- Use code for arithmetic, string matching, and rule evaluation.
- Implement Chain of Thought to reduce LLM errors.
- Employ rigid output formats to guide LLM responses.
Topics
- LLM Reliability
- Agentic Engineering
- Cascading Failures
- LLM Pipelines
- Deterministic Systems
Code references
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI & ML – Radar.