Vision-Language Models as Zero-Annotation Oracles in Histopathology

· Source: Computer Vision and Pattern Recognition · Field: Science & Research — Health & Medical Research, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new coarse-to-fine approach addresses foreground segmentation in computational pathology by utilizing general-purpose Vision-Language Models (VLMs) as zero-annotation oracles. This method redefines tissue-versus-background discrimination as a natural-image recognition problem, enabling VLMs trained on internet-scale corpora to generalize effectively where domain-specific models fail, particularly on specialized stains like Jones silver or Elastica van Gieson. Evaluated on Leica-75, a benchmark of 75 renal transplant whole-slide images, the approach achieved Dice scores of 0.858 +/- 0.027 on Jones and 0.853 +/- 0.041 on EVG, demonstrating 7x lower cross-stain variance than leading supervised baselines. Few-shot prompting with Auto-context further improved hard cases on Stress-32, boosting Dice from 0.470 to 0.819 for the 2B model. The VLM-based annotation review also matched human expert consensus with a kappa of 0.989 for blur detection. Pseudo-labels generated by this framework can distill lightweight student models, offering a scalable solution to a digital pathology bottleneck.

Key takeaway

For computational pathologists struggling with robust foreground segmentation across diverse histopathology stains, you should consider integrating VLM-based approaches. This framework offers superior generalization compared to traditional supervised models, achieving Dice scores like 0.858 on Jones stains. By utilizing zero-annotation oracles and pseudo-label distillation, you can significantly reduce manual labeling costs and improve pipeline scalability, particularly for specialized stains where existing methods fail.

Key insights

General-purpose Vision-Language Models (VLMs) offer robust, zero-annotation foreground segmentation in histopathology, generalizing across diverse stains.

Principles

Method

A coarse-to-fine approach recasts foreground segmentation as a visual perception task, using VLMs as zero-annotation oracles, followed by few-shot prompting and pseudo-label distillation.

In practice

Topics

Best for: AI Scientist, Research Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.