Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A study investigating apparent Large Language Model (LLM) triage failures, particularly the high under-triage rates reported for consumer LLMs in constrained multiple-choice output compared to free-text, concludes that these failures stem from the output format, not a lack of clinical knowledge. Using sparse-autoencoder (SAE) features in Gemma 3 4B/12B IT and Qwen3-8B, researchers found that medical features activate on the clinical narrative under both formats but become silent at the multiple-choice decision token. Three independent methods—natural-language autoencoder verbalization, decision-token logit attribution, and top-feature characterization—confirmed that scaffold and format features, not medical features, drive decision logits. The multiple-choice penalty inverts under structured and natural-language input, and failures are dominated by "off-by-one" errors.

Key takeaway

For AI Scientists evaluating LLM clinical triage performance, recognize that apparent knowledge failures often reflect output format biases rather than a deficit in the model's internal clinical understanding. You should investigate the influence of scaffold and format features on decision logits, especially when comparing multiple-choice versus free-text outputs. This perspective shifts diagnostic efforts from knowledge retrieval to the model's decision-mapping mechanisms, potentially revealing "off-by-one" errors as a primary failure mode.

Key insights

LLM clinical triage failures originate in output format mechanisms, not internal clinical knowledge representation.

Principles

Method

Employed sparse-autoencoder (SAE) features, natural-language autoencoder verbalization, decision-token logit attribution, and top-feature characterization to analyze LLM internal representations.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.