How Veris AI and Lume Security built a self-improving AI agent with Microsoft Foundry

2026-03-31 · Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Cybersecurity & Data Privacy · Depth: Intermediate, long

Summary

Veris AI and Lume Security, in collaboration with Microsoft Foundry, have developed a self-improving AI agent system designed to transition AI agents from demos to production environments. This system addresses the challenge of unseen failure modes in production by using a high-fidelity simulation environment built by Veris AI on Microsoft Azure. This environment expands production failures into families of realistic scenarios, generating targeted data to optimize agent behavior through automated context engineering and reinforcement learning. The approach ensures improvements without regressing on previous issues. The solution is demonstrated with a security agent from Lume Security, which leverages an intelligence graph to power policy-aligned agents for security, compliance, and IT workflows, reducing time spent on routine requests by 35-55% and improving decision-making.

Key takeaway

For AI Engineers building production-grade agents, invest early in an orchestration and safety layer, alongside an environment-driven evaluation system. This approach creates a continuous improvement loop, allowing you to ship fixes without regressions and leverage production failures as the highest-signal input to continuously harden your AI systems, ensuring reliability and performance.

Key insights

High-fidelity simulation and automated optimization enable AI agents to self-improve safely in production.

Principles

Agent evaluation must grade outcomes, not just answers.
Production failures are high-signal inputs for system hardening.
Orchestration layers standardize model usage and safety.

Method

The system reconstructs production failures, expands them into scenario variants, stress-tests agents in simulation, and uses grader signals to refine prompts and apply reinforcement learning updates in a closed loop.

In practice

Use simulation to expand rare production failures.
Implement LLM-based evaluators for targeted rubrics.
Validate prompt updates against regression suites.

Topics

AI Agent Optimization
High-Fidelity Simulation
Security Intelligence Graph
Microsoft Foundry
Automated Evaluation

Best for: AI Engineer, MLOps Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.