A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives

2026-02-24 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Environmental Science & Earth Systems · Depth: Expert, extended

Summary

PANGAEA-GPT is a hierarchical multi-agent system designed for autonomous data discovery and analysis within geoscientific data archives like PANGAEA, which hosts over 400,000 curated datasets. This framework addresses the challenge of underutilized data by implementing a Supervisor-Worker topology with data-type-aware routing, sandboxed deterministic code execution, and self-correction mechanisms. The system features a Search Agent that uses a ReAct loop for iterative query refinement, outperforming baseline keyword matching and simple LLM query translation, achieving an 8.14/10 mean score on a 100-query benchmark. Five specialist worker agents (Oceanographer, Ecologist, Visualization, DataFrame, and Writer) handle specific data types and tasks, enabling complex, multi-step workflows in physical oceanography and ecology with minimal human intervention. Validation scenarios demonstrated its capacity for cross-domain integration, statistical analysis, and visualization, including autonomously resolving API errors and refining plot layouts.

Key takeaway

For AI Researchers and Research Scientists developing autonomous systems for scientific data, PANGAEA-GPT's hierarchical multi-agent architecture offers a robust blueprint. You should consider implementing data-type-aware routing, sandboxed execution, and multi-level self-correction (programmatic and visual) to enhance system reliability and reduce manual intervention in complex, heterogeneous data environments. This approach can significantly improve data discoverability and analytical workflow automation.

Key insights

A hierarchical multi-agent system autonomously discovers and analyzes geoscientific data, improving reuse and reducing manual effort.

Principles

Separate reasoning from execution for robust agent systems.
Iterative query refinement enhances search precision.
Self-correction via execution feedback improves reliability.

Method

PANGAEA-GPT employs a Supervisor-Worker architecture with data-type-aware routing to specialist agents. It uses a ReAct loop for search, sandboxed Python execution, and incorporates both programmatic traceback analysis and reflexive visual quality control for self-correction.

In practice

Use data-type-aware routing for heterogeneous data.
Implement sandboxed execution for code safety and state persistence.
Integrate visual quality control for automated plot refinement.

Topics

Hierarchical Multi-Agent Systems
Geoscientific Data Archives
Large Language Models
Autonomous Data Discovery
Self-Correction Mechanisms

Code references

Best for: AI Researcher, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.