Description-Code Inconsistency in Real-world MCP Servers: Measurement, Detection, and Security Implications
Summary
A study reveals Description-Code Inconsistency (DCI) as a widespread reliability and security risk in Model Context Protocol (MCP) servers, which enable Large Language Models (LLMs) to use external tools. DCI occurs when a tool's natural language description misrepresents its underlying code implementation. Researchers developed DCIChecker, an automated framework combining structure-aware static analysis with a Direct-Reverse-Arbitration (DRA) prompting method, to cross-validate tool descriptions against their code. Applied to 19,200 description-code pairs from 2,214 real-world MCP servers, the measurement found DCI in 9.93% of pairs, affecting 35.00% of servers. The most common inconsistency is overclaimed functionality (35.40%). DCI creates critical defense blind spots, leading to operational failures, unexpected function execution, unperceived system load, privacy harm, and potential malicious exploitation.
Key takeaway
For MLOps Engineers deploying LLM-powered agents, you must actively verify tool description-code consistency. Your systems face significant risks, including task failures and privacy harm, if tool descriptions misrepresent their actual behavior or hidden side effects. Implement automated DCI detection like DCIChecker in your CI/CD pipelines and enforce strict semantic fidelity requirements for tool developers. This proactive approach will enhance agent reliability and reduce the attack surface from misleading tool interfaces.
Key insights
Semantic mismatches between LLM tool descriptions and code, termed DCI, pose critical reliability and security threats.
Principles
- LLM tool descriptions are operational specifications.
- Semantic fidelity is a critical security property.
- DCI can stem from documentation drift or intent.
Method
DCIChecker uses structure-aware static analysis to create code-bundles, then applies a Direct-Reverse-Arbitration (DRA) prompting strategy with an LLM to compare descriptions against code for DCI.
In practice
- Co-maintain tool descriptions with code changes.
- Explicitly disclose all side effects and resource costs.
- Vet tools via consistency checks before LLM exposure.
Topics
- Model Context Protocol
- Description-Code Inconsistency
- LLM Tool Use
- AI Agent Security
- Static Code Analysis
- Prompt Engineering
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, AI Scientist, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.