Trustworthy Self-Composable Big-Data-as-a-Service: An LLM-Orchestrated Multi-Agent Framework for Automated Data Engineering, AutoML, MLOps Deployment, and Drift-Aware Lifecycle Optimization
Summary
A trustworthy self-composable Big-Data-as-a-Service (BDaaS) framework is proposed, utilizing LLM-orchestrated multi-agent collaboration to automate the entire data engineering, AutoML, MLOps deployment, and drift-aware lifecycle. This architecture decomposes the BDaaS workflow into specialized agents for tasks like data ingestion, cleaning, feature engineering, model training, evaluation, deployment, and monitoring. A central LLM orchestrator dynamically composes workflows, validates outputs, and manages context. The framework integrates shared artifact governance, reproducibility support, human-in-the-loop checkpoints, and drift-aware feedback loops. Prototype evaluation on controlled tabular datasets demonstrated an average F1-score of 0.662, outperforming single-agent LLM (0.652), AutoML-only (0.644), and manual ML (0.563). It also achieved 100.0% lifecycle completion, 100.0% artifact traceability, 100.0% deployment readiness, and rapid drift recovery within one monitoring window.
Key takeaway
For MLOps Engineers and AI Architects building robust Big-Data-as-a-Service platforms, you should consider adopting LLM-orchestrated multi-agent frameworks. This approach enables end-to-end automation, from data engineering to drift-aware MLOps deployment, significantly improving lifecycle reliability, artifact traceability, and deployment readiness. Implement human-in-the-loop checkpoints and comprehensive artifact governance to balance automation efficiency with necessary oversight and reproducibility in production environments.
Key insights
LLM-orchestrated multi-agent systems enable trustworthy, adaptive, and production-oriented Big-Data-as-a-Service lifecycle automation.
Principles
- Decompose complex BDaaS workflows into specialized agents.
- Centralized LLM orchestration coordinates agents and validates outputs.
- Integrate artifact governance, human oversight, and drift-aware feedback.
Method
The framework uses an LLM orchestrator to decompose tasks, select agents, plan execution, and validate outputs across data ingestion, cleaning, feature engineering, AutoML, MLOps deployment, and drift monitoring, supported by shared artifact governance and human checkpoints.
In practice
- Automate data ingestion, cleaning, and feature engineering.
- Deploy ML models with versioning and rollback capabilities.
- Implement continuous monitoring for data and concept drift.
Topics
- LLM Agents
- Big-Data-as-a-Service
- MLOps Automation
- Data Drift Detection
- Automated Data Engineering
- Artifact Governance
Best for: Research Scientist, AI Scientist, MLOps Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.