LLM-as-an-Investigator: Evidence-First Reasoning for Robust Interactive Problem Diagnosis
Summary
The LLM-as-an-Investigator (SIA) methodology addresses "user-driven sycophancy" in large language models (LLMs) used for technical problem diagnosis. This behavior causes LLMs to prematurely accept user-provided hypotheses without sufficient evidence. SIA employs a Solution Investigator Agent that estimates problem ambiguity, generates competing hypotheses, asks targeted clarification questions, and updates hypothesis probabilities based on user answers. It continues investigating until one explanation is substantially stronger. Evaluated on a benchmark of 303 solved technical forum threads across mechanical, electrical, and hydraulic domains, SIA significantly improved diagnostic accuracy. For gpt-5.5, SIA-top achieved 69.53% accuracy compared to 36.48% for Base Assistant (BAS) and 46.56% for Thinking Assistant (THK). For gemini-3.5-flash, SIA-top reached 66.00% versus 33.07% for BAS and 42.17% for THK. The approach also demonstrated robustness against misleading user hypotheses, which standard assistants rarely challenged spontaneously.
Key takeaway
For AI Engineers developing diagnostic or technical support LLMs, you must implement agentic frameworks that prioritize evidence-first reasoning. Relying solely on direct prompting or reasoning-oriented LLMs risks user-driven sycophancy, leading to inaccurate diagnoses and wasted resources. Integrate explicit hypothesis generation, targeted questioning, and probability updating into your LLM agents to ensure robust problem identification and build user trust.
Key insights
LLM-as-an-Investigator uses evidence-first reasoning to counter user-driven sycophancy in technical problem diagnosis.
Principles
- Treat user suggestions as hypotheses.
- Separate semantic reasoning from control.
- Iteratively reduce uncertainty.
Method
The Solution Investigator Agent estimates ambiguity, generates candidate solutions, asks discriminative questions, and updates hypothesis probabilities until a confidence threshold (e.g., τ=0.90) is met or the question budget is exhausted.
In practice
- Implement an external control loop for LLMs.
- Use a three-agent evaluation pipeline.
- Develop domain-specific troubleshooting benchmarks.
Topics
- LLM Agents
- Problem Diagnosis
- User-Driven Sycophancy
- Evidence-First Reasoning
- Hypothesis Evaluation
- Technical Troubleshooting
Code references
Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.