ALSO: Adversarial Online Strategy Optimization for Social Agents

2026-05-18 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

ALSO (Adversarial onLine Strategy Optimization) is a novel framework designed for online strategy optimization in multi-agent social simulations, specifically addressing the non-stationary nature of interactions where agents dynamically adapt. Unlike traditional Large Language Model (LLM) based social agents that rely on static personas or existing offline reinforcement learning methods that assume stationarity, ALSO formulates multi-turn interaction as an adversarial bandit problem. It treats combinations of static personas and dynamic strategy instructions as "arms" and introduces a lightweight neural surrogate model to predict rewards from interaction histories, enabling sample-efficient exploration and continuous online adaptation. Experiments on the Sotopia benchmark demonstrate that ALSO consistently outperforms static baselines and other optimization methods, achieving a +16.60% overall improvement on Sotopia-Hard and an +83.79% gain in relationship outcomes, particularly in challenging, high-conflict scenarios.

Key takeaway

For research scientists developing socially intelligent LLM agents, ALSO provides a robust method to overcome the limitations of static personas and non-stationary environments. You should consider implementing an adversarial online strategy optimization approach, leveraging a neural surrogate for reward prediction and an exponential-weights selector, to achieve significant improvements in adaptive social behaviors and relationship outcomes in multi-agent simulations.

Key insights

ALSO enables LLM agents to adapt strategies online in dynamic social simulations by treating interactions as an adversarial bandit problem.

Principles

Social simulation is inherently non-stationary.
Static personas limit adaptive social intelligence.
Randomized selection is crucial in adversarial settings.

Method

ALSO combines an exponential-weights selector with a lightweight neural surrogate to predict rewards and generalize sparse feedback across strategies, updating online without modifying the base LLM.

In practice

Augment LLM personas with dynamic strategy instructions.
Use a neural surrogate for sample-efficient reward prediction.
Apply exponential decay to prioritize recent interaction feedback.

Topics

Adversarial Bandit Problem
Online Strategy Optimization
LLM-based Social Agents
Multi-agent Social Simulation
Neural Surrogate Models

Code references

Babylonehy/ALSO

Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.