Women, Peace and Security Frameworks Must Apply to Defense AI

· Source: Tech Policy Press · Field: Government & Public Sector — Public Policy & Governance, Public Safety & Security, Artificial Intelligence & Machine Learning · Depth: Intermediate, long

Summary

AI tools are actively deployed in conflict zones, with examples including Project Maven for targeting in Iraq, Syria, Yemen, and Ukraine, and AI-generated intelligence informing strikes in Gaza. Despite states adopting Women, Peace and Security (WPS) frameworks for security decisions, these commitments are not consistently applied to the AI systems informing those decisions. Research by Our Secure Future's Project Delphi found that commercially available large language models (LLMs) fail to operationalize WPS standards, leading to a "WPS competence gap." This gap, a measurable performance drop when WPS language is absent from prompts, results in AI models omitting critical considerations for over 50 percent of the global population. Customizing and evaluating AI models with a robust WPS perspective, including explicit instructions and retrieval augmentation with WPS data and policy frameworks, significantly improves accuracy and decision-making in conflict scenarios.

Key takeaway

For defense organizations and policymakers procuring AI for conflict-related applications, you must demand evidence that AI models are configured and validated against existing policy frameworks like WPS. Your current reliance on out-of-the-box AI tools, which demonstrably fail to integrate critical human security considerations, introduces significant operational and ethical risks. Implement the WPS AI Benchmark as a contractual requirement to ensure vendors deliver models that meet your established commitments, thereby improving decision-making and civilian protection.

Key insights

AI models in conflict zones fail to integrate Women, Peace and Security (WPS) standards, creating a critical "WPS competence gap."

Principles

Method

Testing AI models across conflict scenarios with varying contextual detail and WPS language in prompts, then scoring against a WPS-specific rubric, reveals performance gaps and customization pathways.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, Executive, AI Ethicist, Policy Maker, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Tech Policy Press.