Post-Launch Capability Expansion of Vision-Language Models via Prompting for On-Orbit Spacecraft Inspection

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, quick

Summary

A study investigates the post-launch capability expansion of vision-language models (VLMs) for on-orbit spacecraft inspection, addressing the impracticality of updating model weights after deployment. Researchers evaluated prompt-driven VLMs, specifically SAM3, for zero-shot instance segmentation of spacecraft components using natural-language prompts without modifying onboard weights. Testing on 129 images of previously unseen satellites under frozen weights, SAM3 achieved 0.385 mAP@0.5 and 0.267 mAP@0.5:0.95. Performance was highly scale-dependent, with large elements like spacecraft bodies (0.639 AP@0.50) and solar arrays (0.598 AP@0.5) localizing reliably, while smaller components such as antennas (0.221 AP@0.5) and thrusters (0.081 AP@0.5) remained challenging. Structured prompts, incorporating spatial and geometric descriptors, improved performance by up to 82% over short category-name prompts. The model operates within the memory and compute limits of contemporary embedded GPUs.

Key takeaway

For robotics engineers or AI scientists developing spaceborne perception systems, prompt-driven vision-language models offer a practical solution for expanding semantic capabilities post-launch. You can add new component recognition without costly weight updates, especially for larger structural elements. Focus on crafting structured prompts with spatial and geometric descriptors to maximize performance, while acknowledging current limitations for fine-scale component localization due to orbital domain shift.

Key insights

Prompt-driven vision-language models enable post-launch semantic expansion for spaceborne inspection without onboard weight updates.

Principles

Method

The study evaluates zero-shot instance segmentation of spacecraft components using prompt-driven vision-language models (SAM3) on 129 unseen satellite images, under a strictly frozen, single-pass inference protocol.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.