The patch model is breaking. AI evaluation needs a new way to disclose what it finds.

2026-06-09 · Source: MLCommons · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Advanced, short

Summary

The traditional "patch model" for coordinated vulnerability disclosure, a 30-year standard in software security, is breaking down for AI systems, according to MLCommons. This model assumes affected systems can be repaired, ending the hazard. However, AI evaluation findings are dual-use, meaning results that aid defenders also lower the cost for adversaries to exploit vulnerabilities. Furthermore, providing too much specific feedback to developers corrupts benchmark integrity, as models might improve on tests without actual general improvement. Crucially, released open-weight AI models cannot be patched; a new version is a distinct artifact, and prior vulnerable copies remain operational indefinitely. MLCommons is developing new disclosure practices for its safety and jailbreak benchmarks and contributing these principles to the ISO/IEC TS 42119-8 standard within ISO/IEC JTC 1/SC 42 to establish a citable, field-wide approach for responsible AI evaluation disclosure.

Key takeaway

For AI Security Engineers evaluating frontier models, recognize that traditional vulnerability disclosure models are insufficient. Your findings are dual-use and open-weight models cannot be patched, meaning identified hazards persist indefinitely. You should align your disclosure policies with emerging standards like ISO/IEC TS 42119-8 to protect against harmful uplift and maintain evaluation integrity. Consider joining efforts like MLCommons' agentic security working group to shape future disclosure norms.

Key insights

The traditional software patch model fails for AI due to dual-use findings, test corruption, and unpatchable open-weight models.

Principles

AI evaluation findings are inherently dual-use.
Specific test feedback corrupts benchmark integrity.
Open-weight models cannot be centrally remediated.

Method

MLCommons is designing disclosure practices for its benchmarks and contributing principles to ISO/IEC TS 42119-8 within ISO/IEC JTC 1/SC 42 to codify a new standard.

In practice

Pin findings to specific model versions.
Aggregate or withhold sensitive results.
Align disclosure policies with emerging standards.

Topics

AI Security
Vulnerability Disclosure
AI Evaluation
Open-weight Models
MLCommons
ISO/IEC Standards
Jailbreak Benchmarks

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, AI Ethicist, Policy Maker

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MLCommons.