PEFT-MedSAM: Efficient Fine-Tuning of Medical Foundation Models for Explainable Skin Lesion Segmentation

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition, Health & Medical Research · Depth: Advanced, quick

Summary

PEFT-MedSAM introduces a parameter-efficient fine-tuning method designed to adapt the Medical Segment Anything Model (MedSAM) for automated segmentation of dermoscopic skin lesions. This approach freezes MedSAM's pre-trained image and prompt encoders, focusing training solely on the lightweight mask decoder. Experiments on the ISIC 2018 benchmark dataset demonstrated a Dice coefficient of .9411 and an Intersection over Union (IoU) of .8918, outperforming a fully trained U-Net baseline (.8715 Dice) and zero-shot MedSAM inference (.8997 Dice). External validation using the PH2 dataset yielded a .9467 Dice coefficient with a standard deviation of +/- .0310. Statistical analysis confirmed significance with a p-value less than .0001. Furthermore, Grad-CAM explainability and a pointing game evaluation showed 98.27% accuracy on a 519-image validation set, confirming accurate lesion region classification to enhance clinical trustworthiness.

Key takeaway

For AI Scientists and Machine Learning Engineers developing medical image segmentation models, PEFT-MedSAM offers a highly efficient fine-tuning strategy. You should consider this method to adapt large foundation models like MedSAM for specific tasks, such as skin lesion detection, significantly reducing computational overhead while achieving superior performance. This approach also integrates explainability, crucial for building clinical trust and accelerating deployment in diagnostic workflows.

Key insights

PEFT-MedSAM efficiently fine-tunes medical foundation models for skin lesion segmentation by training only the mask decoder.

Principles

Method

PEFT-MedSAM fine-tunes MedSAM by freezing its pre-trained image and prompt encoders, training only the lightweight mask decoder. It uses Grad-CAM for explainability and a pointing game for evaluation.

In practice

Topics

Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.