Trustworthy Self-Composable Big-Data-as-a-Service: An LLM-Orchestrated Multi-Agent Framework for Automated Data Engineering, AutoML, MLOps Deployment, and Drift-Aware Lifecycle Optimization

2026-06-17 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Robotics & Autonomous Systems · Depth: Expert, long

Summary

A trustworthy self-composable Big-Data-as-a-Service (BDaaS) framework is proposed, utilizing LLM-orchestrated multi-agent collaboration to automate the entire data engineering, AutoML, MLOps deployment, and drift-aware lifecycle. This architecture decomposes the BDaaS workflow into specialized agents for tasks like data ingestion, cleaning, feature engineering, model training, evaluation, deployment, and monitoring. A central LLM orchestrator dynamically composes workflows, validates outputs, and manages context. The framework integrates shared artifact governance, reproducibility support, human-in-the-loop checkpoints, and drift-aware feedback loops. Prototype evaluation on controlled tabular datasets demonstrated an average F1-score of 0.662, outperforming single-agent LLM (0.652), AutoML-only (0.644), and manual ML (0.563). It also achieved 100.0% lifecycle completion, 100.0% artifact traceability, 100.0% deployment readiness, and rapid drift recovery within one monitoring window.

Key takeaway

For MLOps Engineers and AI Architects building robust Big-Data-as-a-Service platforms, you should consider adopting LLM-orchestrated multi-agent frameworks. This approach enables end-to-end automation, from data engineering to drift-aware MLOps deployment, significantly improving lifecycle reliability, artifact traceability, and deployment readiness. Implement human-in-the-loop checkpoints and comprehensive artifact governance to balance automation efficiency with necessary oversight and reproducibility in production environments.

Key insights

LLM-orchestrated multi-agent systems enable trustworthy, adaptive, and production-oriented Big-Data-as-a-Service lifecycle automation.

Principles

Decompose complex BDaaS workflows into specialized agents.
Centralized LLM orchestration coordinates agents and validates outputs.
Integrate artifact governance, human oversight, and drift-aware feedback.

Method

The framework uses an LLM orchestrator to decompose tasks, select agents, plan execution, and validate outputs across data ingestion, cleaning, feature engineering, AutoML, MLOps deployment, and drift monitoring, supported by shared artifact governance and human checkpoints.

In practice

Automate data ingestion, cleaning, and feature engineering.
Deploy ML models with versioning and rollback capabilities.
Implement continuous monitoring for data and concept drift.

Topics

LLM Agents
Big-Data-as-a-Service
MLOps Automation
Data Drift Detection
Automated Data Engineering
Artifact Governance

Best for: Research Scientist, AI Scientist, MLOps Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.