Women, Peace and Security Frameworks Must Apply to Defense AI
Summary
AI tools are actively deployed in conflict zones, with examples including Project Maven for targeting in Iraq, Syria, Yemen, and Ukraine, and AI-generated intelligence informing strikes in Gaza. Despite states adopting Women, Peace and Security (WPS) frameworks for security decisions, these commitments are not consistently applied to the AI systems informing those decisions. Research by Our Secure Future's Project Delphi found that commercially available large language models (LLMs) fail to operationalize WPS standards, leading to a "WPS competence gap." This gap, a measurable performance drop when WPS language is absent from prompts, results in AI models omitting critical considerations for over 50 percent of the global population. Customizing and evaluating AI models with a robust WPS perspective, including explicit instructions and retrieval augmentation with WPS data and policy frameworks, significantly improves accuracy and decision-making in conflict scenarios.
Key takeaway
For defense organizations and policymakers procuring AI for conflict-related applications, you must demand evidence that AI models are configured and validated against existing policy frameworks like WPS. Your current reliance on out-of-the-box AI tools, which demonstrably fail to integrate critical human security considerations, introduces significant operational and ethical risks. Implement the WPS AI Benchmark as a contractual requirement to ensure vendors deliver models that meet your established commitments, thereby improving decision-making and civilian protection.
Key insights
AI models in conflict zones fail to integrate Women, Peace and Security (WPS) standards, creating a critical "WPS competence gap."
Principles
- AI systems must comply with existing policy frameworks.
- Explicit prompting improves AI performance in policy adherence.
- Configuration, not capability, is key to AI policy compliance.
Method
Testing AI models across conflict scenarios with varying contextual detail and WPS language in prompts, then scoring against a WPS-specific rubric, reveals performance gaps and customization pathways.
In practice
- Integrate WPS benchmarks into AI procurement requirements.
- Empower WPS advisors to influence AI tool development.
- Use structured prompts to improve LLM performance.
Topics
- Women, Peace and Security Frameworks
- Defense AI
- Large Language Models
- AI Procurement
- WPS Competence Gap
Code references
Best for: CTO, VP of Engineering/Data, Executive, AI Ethicist, Policy Maker, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Tech Policy Press.