MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning
Summary
MODF-SIR is a multi-agent collaborative framework leveraging a lightweight Multimodal Large Language Model (MLLM) specifically designed for social intelligence reasoning. This framework enhances both its training and inference phases through knowledge distillation. It precisely localizes multi-modal social intelligence data and identifies long-tail events, rendering them as formatted, explicit text to prevent critical information loss during tokenization. The system integrates Test-Time Adaptation (TTA) across its entire reasoning pipeline, covering long-tail event processing, Chain-of-Thought (CoT) prompting, and self-reflection. This TTA mechanism is further distillation-enhanced using Low-Rank Adaptation (LoRA) to fine-tune the foundation model for instance-level reasoning. Extensive evaluations show MODF-SIR achieves state-of-the-art results, utilizing approximately 30% of training data from IntentTrain. Codes, a demo, LoRA, and the training dataset are publicly available.
Key takeaway
For Machine Learning Engineers developing Multimodal Large Language Models for social intelligence, MODF-SIR presents a validated approach to enhance reasoning, particularly with challenging long-tail events. You should consider integrating knowledge distillation, explicit long-tail event formatting, and Test-Time Adaptation with LoRA into your MLLM pipelines. This framework demonstrates state-of-the-art performance and offers publicly available code and models, providing a strong foundation for your own development efforts.
Key insights
A multi-agent, omni-modal framework uses distillation and TTA with LoRA for social intelligence reasoning.
Principles
- Knowledge distillation enhances MLLM training and inference.
- Explicit text formatting prevents long-tail event overshadowing.
- Test-Time Adaptation improves instance-level reasoning.
Method
The framework localizes multi-modal data, extracts and formats long-tail events as explicit text, then applies distillation-enhanced Test-Time Adaptation (TTA) with LoRA for instance-level fine-tuning across the reasoning pipeline.
In practice
- Utilize LoRA for instance-level MLLM fine-tuning.
- Format long-tail data explicitly to avoid tokenization issues.
- Implement TTA for improved reasoning pipelines.
Topics
- Multi-agent Systems
- Multimodal LLMs
- Social Intelligence Reasoning
- Knowledge Distillation
- Test-Time Adaptation
- LoRA
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.