Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults

2026-05-30 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A new protocol reveals that external information feeds significantly steer LLM agent decisions, even when the model, persona, topic, and final prompt are fixed. Researchers conducted 2,785 decision rollouts across four open instruct LLMs from three independent labs, isolating the causal effect of feed composition and ordering during a ten-turn "scrolling" phase. They identified "adversarial capitulation," "default saturation," and a "default-direction asymmetry," where a one-sided feed can shift a decision from 5% to 100% certainty (Fisher p as low as 3 x 10^-10) for uncertain models, but cannot dislodge firmly held defaults. This effect follows a dose-response curve, generalizes to security-relevant choices like relaxing access controls, and is partly mitigated by simple feed-level defenses, though frontier models retain their defaults. The study characterizes the recommender as a practical, default-bounded control surface, emphasizing the need to audit the feed layer in agent evaluations.

Key takeaway

For AI Security Engineers or MLOps teams deploying LLM agents, you must extend your safety evaluations beyond model prompts to include the upstream information feeds. Your agents' decisions, even on critical security choices like access controls, can be significantly swayed by curated content, shifting outcomes from 5% to 100%. Implement feed-level defenses and audit the recommender layer to prevent adversarial steering and ensure your agents maintain intended operational integrity.

Key insights

External information feeds significantly influence LLM agent decisions, acting as a control surface independent of the core model.

Principles

LLM agent decisions exhibit adversarial capitulation and default saturation.
Feed influence follows a dose-response curve and shows default-direction asymmetry.
Frontier models are more resilient to feed manipulation than open instruct LLMs.

Method

A controlled protocol fixes model, persona, topic, and final prompt, varying only the composition and ordering of posts in a ten-turn "scrolling" phase to isolate feed effects.

In practice

Implement simple feed-level defenses to mitigate adversarial steering.
Audit the feed layer for LLM agents, not just the final prompt.
Consider feed curation in security-relevant LLM agent deployments.

Topics

LLM Agents
Adversarial Attacks
Feed Curation
Recommender Systems
AI Safety
Access Control

Best for: Research Scientist, AI Architect, CTO, AI Scientist, AI Security Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.