Multi-Field Hybrid Retrieval-Augmented Generation for Maritime Accident Root Cause Analysis
Summary
A multi-field hybrid retrieval-augmented generation (RAG) framework is proposed for automated maritime accident root cause analysis (RCA). This system utilizes a comprehensive dataset of 13,329 Korea Maritime Safety Tribunal (KMST) reports spanning 1971-2025. The framework transforms raw adjudications into structured "incident cards" with distinct Summary, Causes, and Disposition fields, alongside a hierarchical L1/L2 cause taxonomy. Its retrieval strategy employs a field-aware hybrid approach, fusing sparse and dense rankings via Reciprocal Rank Fusion (RRF). Experimental results show the proposed retrieval significantly outperforms baselines, improving NormRecall@100 from 0.18 to 0.55. Furthermore, grounding the generator on retrieved precedents enhances RCA generation quality, increasing the LLM-as-a-judge score from 3.34 to 3.72 over an LLM-only baseline.
Key takeaway
For AI Scientists or Machine Learning Engineers developing RAG systems for specialized domains like legal or medical reports, this research demonstrates the value of a multi-field hybrid approach. You should consider structuring complex documents into field-specific "incident cards" and employing Reciprocal Rank Fusion for retrieval. This method can significantly improve both retrieval accuracy and the quality of generated analyses, streamlining workflows for evidence-based report drafting.
Key insights
Field-aware hybrid RAG significantly improves maritime accident root cause analysis by structuring reports and enhancing retrieval and generation.
Principles
- Structuring unstructured reports into "incident cards" improves retrieval.
- Hybrid retrieval fusing sparse and dense rankings is effective.
- Field-aware indexing enhances relevance in complex documents.
Method
The framework transforms raw adjudications into structured "incident cards" with Summary, Causes, and Disposition fields, indexed with a hierarchical cause taxonomy. It then uses a field-aware hybrid retrieval strategy via Reciprocal Rank Fusion.
In practice
- Structure legal documents into "incident cards" for RAG.
- Use Reciprocal Rank Fusion for hybrid retrieval.
- Index specific document fields for enhanced relevance.
Topics
- Retrieval-Augmented Generation
- Maritime Accident Analysis
- Hybrid Retrieval
- Reciprocal Rank Fusion
- Knowledge Base Structuring
- Large Language Models
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.