CPS4: Class Prompt driven Semi-Supervised Spine Segmentation with Class-specific Consistency Constraint

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

CPS4 is introduced as the first text-guided semi-supervised spine segmentation network, leveraging Vision Language Models (VLMs) and class prompts to improve pseudo label quality. It addresses the challenge of ensuring consistency between textual class prompts and specific spine unit regions in multi-class segmentation. The system operates in two stages: a VLM pretraining phase with token- and pixel-level attention loss to enforce semantic coupling between prompts and spine units, followed by a class prompt-driven semi-supervised segmentation stage. This second stage uses the pretrained vision-text encoder to generate class-specific binary maps for unlabeled images, which are then integrated into a unified multi-class segmentation map. CPS4 achieved a superior Dice score of 80.44% using only 5% labeled data on a public spine segmentation dataset, surpassing other semi-supervised and VLM methods.

Key takeaway

For Computer Vision Engineers developing medical image segmentation models with limited labeled data, CPS4 offers a robust approach. You should consider integrating text-guided semi-supervised learning, specifically employing class prompts with explicit consistency constraints, to significantly improve pseudo label quality. This method, demonstrated by CPS4's 80.44% Dice score with only 5% labeled data, can enhance model performance and reduce annotation dependency in your projects.

Key insights

CPS4 enhances semi-supervised spine segmentation by using class prompts with explicit consistency constraints in VLMs.

Principles

Method

CPS4 employs a two-stage training process: first, VLM pretraining with token- and pixel-level attention loss for prompt-unit consistency; second, using the pretrained encoder to generate and integrate class-specific binary segmentation maps.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.