Ekka: Automated Diagnosis of Silent Errors in LLM Inference

2026-06-04 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, long

Summary

Ekka is an automated diagnosis system designed to identify silent errors within complex LLM serving frameworks like vLLM and SGLang. These errors cause output quality degradation without explicit signals, making them notoriously difficult to diagnose due to the semantic gap between high-level symptoms and low-level root causes. Ekka addresses this by framing diagnosis as a differential debugging problem, leveraging semantically correct reference implementations such as HuggingFace Transformers. The system systematically aligns and compares intermediate execution states between a target and a reference framework. Its multi-stage agentic workflow involves codebase analysis, model architecture mapping, activation alignment, and change-point analysis on a robust error ratio. On a benchmark of real-world silent errors, Ekka achieved 80% pass@1 and 88% pass@5 diagnosis accuracy, outperforming state-of-the-art systems with a 24% to 34% improvement. It also successfully diagnosed 4 new silent errors, confirmed by developers, at an average cost of approximately \$30 per case.

Key takeaway

For MLOps Engineers or ML Scientists deploying LLMs, if you are encountering silent output quality degradation in your serving frameworks, manual diagnosis is inefficient and costly. You should consider adopting or developing automated differential debugging tools like Ekka. These systems systematically compare intermediate execution states against a trusted reference, significantly reducing diagnosis time and improving accuracy. Implementing such a solution can prevent prolonged misdiagnosis and maintain model performance in production.

Key insights

Differential debugging against a reference implementation can automate silent error diagnosis in LLM serving frameworks.

Principles

Silent errors have diverse root causes across the LLM stack.
Semantic gap hinders manual diagnosis of silent errors.
Reference implementations enable differential debugging.

Method

Ekka's method involves component mapping, activation alignment, and change-point analysis on error ratios to pinpoint divergence in execution states.

In practice

Use agentic workflows for complex code analysis.
Compare intermediate states across different frameworks.
Apply change-point analysis for error localization.

Topics

LLM Inference
Silent Errors
Differential Debugging
Serving Frameworks
Automated Diagnosis
Model Performance

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.