Ekka: Automated Diagnosis of Silent Errors in LLM Inference

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, long

Summary

Ekka is an automated diagnosis system designed to identify silent errors within complex LLM serving frameworks like vLLM and SGLang. These errors cause output quality degradation without explicit signals, making them notoriously difficult to diagnose due to the semantic gap between high-level symptoms and low-level root causes. Ekka addresses this by framing diagnosis as a differential debugging problem, leveraging semantically correct reference implementations such as HuggingFace Transformers. The system systematically aligns and compares intermediate execution states between a target and a reference framework. Its multi-stage agentic workflow involves codebase analysis, model architecture mapping, activation alignment, and change-point analysis on a robust error ratio. On a benchmark of real-world silent errors, Ekka achieved 80% pass@1 and 88% pass@5 diagnosis accuracy, outperforming state-of-the-art systems with a 24% to 34% improvement. It also successfully diagnosed 4 new silent errors, confirmed by developers, at an average cost of approximately \$30 per case.

Key takeaway

For MLOps Engineers or ML Scientists deploying LLMs, if you are encountering silent output quality degradation in your serving frameworks, manual diagnosis is inefficient and costly. You should consider adopting or developing automated differential debugging tools like Ekka. These systems systematically compare intermediate execution states against a trusted reference, significantly reducing diagnosis time and improving accuracy. Implementing such a solution can prevent prolonged misdiagnosis and maintain model performance in production.

Key insights

Differential debugging against a reference implementation can automate silent error diagnosis in LLM serving frameworks.

Principles

Method

Ekka's method involves component mapping, activation alignment, and change-point analysis on error ratios to pinpoint divergence in execution states.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.