A Multi-Agent system for Multi-Objective constrained optimization

2026-06-19 · Source: cs.MA updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Cloud Computing & IT Infrastructure · Depth: Expert, long

Summary

MAMO (Multi-Agent system for Multi-Objective constrained optimization), presented at AAMAS 2026 in Paphos, Cyprus, from May 25-29, 2026, is a novel multi-agent reinforcement learning approach designed to address the challenge of balancing conflicting objectives in constrained optimization problems within dynamic computing and networking environments. Traditional methods often rely on manually selected reward weights, which critically impact policy behavior and make it difficult to achieve an appropriate trade-off between cost optimization and constraint satisfaction, especially in non-stationary settings. MAMO tackles this by decoupling task execution from objective design, formulating the selection of these crucial reward weights as a learning problem. It employs a hierarchical architecture with a Task-Execution (TE) agent that learns control policies and a Weight-Adaptation (WA) agent that observes long-term system indicators to dynamically adjust the weighting coefficients, enabling autonomous adaptation to evolving conditions.

Key takeaway

For Machine Learning Engineers designing RL solutions for dynamic, constrained optimization problems, MAMO offers a robust alternative to manual reward weight tuning. You should consider implementing a hierarchical multi-agent system like MAMO to autonomously adapt objective trade-offs, ensuring your policies remain optimal even as environmental conditions or QoS requirements evolve. This approach can significantly reduce the effort in fine-tuning and improve system resilience.

Key insights

MAMO autonomously learns optimal reward weights for constrained multi-objective RL, decoupling task execution from objective design.

Principles

Decouple task execution from objective design.
Treat reward weight selection as a learning problem.
Use hierarchical agents for different time scales.

Method

MAMO uses a two-phase iterative workflow: WA agent selects weights for a training horizon, TE agent learns; then WA agent evaluates performance and adjusts weights.

In practice

Apply to edge-FaaS replica scaling.
Manage resource selection and workload scaling.
Adapt to non-stationary workload patterns.

Topics

Multi-Agent Reinforcement Learning
Constrained Optimization
Multi-Objective Optimization
Reward Shaping
Edge Computing
FaaS Resource Scaling

Code references

FFede0/RL4CC

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.