A Lightweight LLM Framework for Disaster Humanitarian Information Classification

2026-02-16 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Crisis Informatics · Depth: Expert, extended

Summary

This study introduces a lightweight, cost-effective framework for classifying disaster-related social media information, specifically tweets, using parameter-efficient fine-tuning on the Llama 3.1 8B model. Researchers integrated and normalized the HumAID dataset, comprising 76,484 tweets across 19 disaster events, into a dual-task benchmark for humanitarian information categorization (10 classes) and event type identification (4 classes). Key findings indicate that LoRA fine-tuning achieved 79.62% accuracy for humanitarian classification, a 37.79% improvement over zero-shot prompting, by training only about 2% of parameters. QLoRA further enabled efficient deployment, retaining 99.4% of LoRA's performance at 50% memory cost. Contrary to common assumptions, Retrieval-Augmented Generation (RAG) strategies degraded fine-tuned model performance due to label noise from retrieved examples, suggesting its utility is inversely correlated with model capability. The research also highlights that persistent classification ambiguities in categories like "other_relevant_information" stem from inherent taxonomy limitations rather than model inadequacy.

Key takeaway

For AI Engineers building crisis intelligence systems, you should prioritize parameter-efficient fine-tuning (LoRA/QLoRA) over Retrieval-Augmented Generation (RAG) for well-defined classification tasks. Your efforts will yield significantly higher accuracy and efficiency, especially in resource-constrained environments. Additionally, consider refining or merging ambiguous classification categories to improve overall system reliability and reduce inherent task confusion, as even advanced models struggle with poorly defined labels.

Key insights

Parameter-efficient fine-tuning significantly outperforms prompting and RAG for disaster tweet classification, enabling efficient, accurate crisis intelligence.

Principles

LoRA fine-tuning is superior for task-specific classification.
RAG utility is inversely correlated with model capability.
Taxonomy ambiguity can constrain classification performance.

Method

The framework uses LoRA/QLoRA fine-tuning on Llama 3.1 8B for dual-task classification of disaster tweets, evaluating against zero-shot, few-shot, and RAG strategies on the HumAID dataset.

In practice

Prioritize LoRA/QLoRA for classification tasks.
Merge semantically overlapping categories for clarity.
Focus resources on high-value, unambiguous labels.

Topics

Disaster Response AI
Large Language Models
Parameter-Efficient Fine-Tuning
LoRA/QLoRA
Retrieval-Augmented Generation

Code references

KaiYin97/CrsisLLM

Best for: AI Engineer, Machine Learning Engineer, Research Scientist, AI Researcher, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.