The patch model is breaking. AI evaluation needs a new way to disclose what it finds.

· Source: MLCommons · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Advanced, short

Summary

The traditional "patch model" for coordinated vulnerability disclosure, a 30-year standard in software security, is breaking down for AI systems, according to MLCommons. This model assumes affected systems can be repaired, ending the hazard. However, AI evaluation findings are dual-use, meaning results that aid defenders also lower the cost for adversaries to exploit vulnerabilities. Furthermore, providing too much specific feedback to developers corrupts benchmark integrity, as models might improve on tests without actual general improvement. Crucially, released open-weight AI models cannot be patched; a new version is a distinct artifact, and prior vulnerable copies remain operational indefinitely. MLCommons is developing new disclosure practices for its safety and jailbreak benchmarks and contributing these principles to the ISO/IEC TS 42119-8 standard within ISO/IEC JTC 1/SC 42 to establish a citable, field-wide approach for responsible AI evaluation disclosure.

Key takeaway

For AI Security Engineers evaluating frontier models, recognize that traditional vulnerability disclosure models are insufficient. Your findings are dual-use and open-weight models cannot be patched, meaning identified hazards persist indefinitely. You should align your disclosure policies with emerging standards like ISO/IEC TS 42119-8 to protect against harmful uplift and maintain evaluation integrity. Consider joining efforts like MLCommons' agentic security working group to shape future disclosure norms.

Key insights

The traditional software patch model fails for AI due to dual-use findings, test corruption, and unpatchable open-weight models.

Principles

Method

MLCommons is designing disclosure practices for its benchmarks and contributing principles to ISO/IEC TS 42119-8 within ISO/IEC JTC 1/SC 42 to codify a new standard.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, AI Ethicist, Policy Maker

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MLCommons.