MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning

2026-06-10 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

MODF-SIR is a multi-agent collaborative framework leveraging a lightweight Multimodal Large Language Model (MLLM) specifically designed for social intelligence reasoning. This framework enhances both its training and inference phases through knowledge distillation. It precisely localizes multi-modal social intelligence data and identifies long-tail events, rendering them as formatted, explicit text to prevent critical information loss during tokenization. The system integrates Test-Time Adaptation (TTA) across its entire reasoning pipeline, covering long-tail event processing, Chain-of-Thought (CoT) prompting, and self-reflection. This TTA mechanism is further distillation-enhanced using Low-Rank Adaptation (LoRA) to fine-tune the foundation model for instance-level reasoning. Extensive evaluations show MODF-SIR achieves state-of-the-art results, utilizing approximately 30% of training data from IntentTrain. Codes, a demo, LoRA, and the training dataset are publicly available.

Key takeaway

For Machine Learning Engineers developing Multimodal Large Language Models for social intelligence, MODF-SIR presents a validated approach to enhance reasoning, particularly with challenging long-tail events. You should consider integrating knowledge distillation, explicit long-tail event formatting, and Test-Time Adaptation with LoRA into your MLLM pipelines. This framework demonstrates state-of-the-art performance and offers publicly available code and models, providing a strong foundation for your own development efforts.

Key insights

A multi-agent, omni-modal framework uses distillation and TTA with LoRA for social intelligence reasoning.

Principles

Knowledge distillation enhances MLLM training and inference.
Explicit text formatting prevents long-tail event overshadowing.
Test-Time Adaptation improves instance-level reasoning.

Method

The framework localizes multi-modal data, extracts and formats long-tail events as explicit text, then applies distillation-enhanced Test-Time Adaptation (TTA) with LoRA for instance-level fine-tuning across the reasoning pipeline.

In practice

Utilize LoRA for instance-level MLLM fine-tuning.
Format long-tail data explicitly to avoid tokenization issues.
Implement TTA for improved reasoning pipelines.

Topics

Multi-agent Systems
Multimodal LLMs
Social Intelligence Reasoning
Knowledge Distillation
Test-Time Adaptation
LoRA

Code references

eeee-sys/MODF-SIR

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.