Post-Launch Capability Expansion of Vision-Language Models via Prompting for On-Orbit Spacecraft Inspection
Summary
A study investigates the post-launch capability expansion of vision-language models (VLMs) for on-orbit spacecraft inspection, addressing the impracticality of updating model weights after deployment. Researchers evaluated prompt-driven VLMs, specifically SAM3, for zero-shot instance segmentation of spacecraft components using natural-language prompts without modifying onboard weights. Testing on 129 images of previously unseen satellites under frozen weights, SAM3 achieved 0.385 mAP@0.5 and 0.267 mAP@0.5:0.95. Performance was highly scale-dependent, with large elements like spacecraft bodies (0.639 AP@0.50) and solar arrays (0.598 AP@0.5) localizing reliably, while smaller components such as antennas (0.221 AP@0.5) and thrusters (0.081 AP@0.5) remained challenging. Structured prompts, incorporating spatial and geometric descriptors, improved performance by up to 82% over short category-name prompts. The model operates within the memory and compute limits of contemporary embedded GPUs.
Key takeaway
For robotics engineers or AI scientists developing spaceborne perception systems, prompt-driven vision-language models offer a practical solution for expanding semantic capabilities post-launch. You can add new component recognition without costly weight updates, especially for larger structural elements. Focus on crafting structured prompts with spatial and geometric descriptors to maximize performance, while acknowledging current limitations for fine-scale component localization due to orbital domain shift.
Key insights
Prompt-driven vision-language models enable post-launch semantic expansion for spaceborne inspection without onboard weight updates.
Principles
- Post-launch model updates are operationally impractical for spaceborne systems.
- Prompt formulation significantly influences VLM performance in zero-shot tasks.
- Zero-shot VLM performance is strongly scale-dependent for object localization.
Method
The study evaluates zero-shot instance segmentation of spacecraft components using prompt-driven vision-language models (SAM3) on 129 unseen satellite images, under a strictly frozen, single-pass inference protocol.
In practice
- Utilize structured prompts for improved VLM performance.
- Consider VLMs for dynamic semantic expansion of large components.
- VLMs can extend capabilities without modifying onboard model weights.
Topics
- Spacecraft Inspection
- Vision-Language Models
- Prompt Engineering
- Zero-shot Learning
- On-orbit Servicing
- Instance Segmentation
- Embedded GPUs
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.