Multimodal LLM-Empowered Re-Ranking for Generalizable Person Re-Identification

2026-06-15 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new approach addresses Domain Generalizable (DG) person re-identification (Re-ID) challenges in unseen scenarios by improving the inference re-ranking stage. While most methods focus on generalizable encoders, this work proposes an MLLM-empowered distance metric to enhance re-ranking. The method involves adapting a Multimodal Large Language Model (MLLM) to Re-ID data through supervised fine-tuning, incorporating a domain-agnostic prompt and a query-candidate hard mining scheme. During inference, this adapted MLLM computes a "μ-distance" that is robust to domain gaps, significantly boosting subsequent re-ranking performance. This model-agnostic approach seamlessly integrates into existing re-ranking frameworks. Extensive experiments demonstrate consistent, substantial performance improvements across multiple DG Re-ID benchmarks.

Key takeaway

For Machine Learning Engineers developing Domain Generalizable Person Re-Identification systems, you should consider integrating MLLM-empowered re-ranking. This approach offers a robust solution to domain gaps by computing a μ-distance during inference, significantly improving performance where traditional methods fail. You can enhance your existing re-ranking frameworks by adapting an MLLM with domain-agnostic prompts and hard mining, potentially reducing the need for extensive encoder retraining.

Key insights

An MLLM-empowered μ-distance metric significantly improves domain-generalizable person re-identification by enhancing inference-stage re-ranking robustness to domain gaps.

Principles

Inference re-ranking is critical for DG Re-ID.
MLLMs offer strong generalization capabilities.
Domain-agnostic prompts enhance MLLM adaptation.

Method

Adapt an MLLM via supervised fine-tuning with a domain-agnostic prompt and hard mining. Employ it to compute a μ-distance for robust re-ranking during inference.

In practice

Integrate MLLM-based μ-distance into re-ranking.
Fine-tune MLLMs with hard mining for Re-ID.
Explore domain-agnostic prompting for MLLM adaptation.

Topics

Person Re-Identification
Domain Generalization
Multimodal Large Language Models
Re-ranking
Supervised Fine-tuning
Computer Vision

Code references

RikoLi/MUSE

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.