Ekka: Automated Diagnosis of Silent Errors in LLM Inference
Summary
Ekka is an automated diagnosis system designed to identify silent errors within complex LLM serving frameworks like vLLM and SGLang. These errors cause output quality degradation without explicit signals, making them notoriously difficult to diagnose due to the semantic gap between high-level symptoms and low-level root causes. Ekka addresses this by framing diagnosis as a differential debugging problem, leveraging semantically correct reference implementations such as HuggingFace Transformers. The system systematically aligns and compares intermediate execution states between a target and a reference framework. Its multi-stage agentic workflow involves codebase analysis, model architecture mapping, activation alignment, and change-point analysis on a robust error ratio. On a benchmark of real-world silent errors, Ekka achieved 80% pass@1 and 88% pass@5 diagnosis accuracy, outperforming state-of-the-art systems with a 24% to 34% improvement. It also successfully diagnosed 4 new silent errors, confirmed by developers, at an average cost of approximately \$30 per case.
Key takeaway
For MLOps Engineers or ML Scientists deploying LLMs, if you are encountering silent output quality degradation in your serving frameworks, manual diagnosis is inefficient and costly. You should consider adopting or developing automated differential debugging tools like Ekka. These systems systematically compare intermediate execution states against a trusted reference, significantly reducing diagnosis time and improving accuracy. Implementing such a solution can prevent prolonged misdiagnosis and maintain model performance in production.
Key insights
Differential debugging against a reference implementation can automate silent error diagnosis in LLM serving frameworks.
Principles
- Silent errors have diverse root causes across the LLM stack.
- Semantic gap hinders manual diagnosis of silent errors.
- Reference implementations enable differential debugging.
Method
Ekka's method involves component mapping, activation alignment, and change-point analysis on error ratios to pinpoint divergence in execution states.
In practice
- Use agentic workflows for complex code analysis.
- Compare intermediate states across different frameworks.
- Apply change-point analysis for error localization.
Topics
- LLM Inference
- Silent Errors
- Differential Debugging
- Serving Frameworks
- Automated Diagnosis
- Model Performance
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.