Frontier Models Can Take Actions at Low Probabilities

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Emerging Technologies & Innovation · Depth: Advanced, quick

Summary

Frontier models like GPT-5, Claude-4.5, and Qwen-3 families demonstrate a surprising capability to perform specific actions at extremely low probabilities, a behavior that could potentially be exploited to evade pre-deployment safety evaluations. Researchers tested these models by prompting them to execute a target action at rates as low as 0.01%, either explicitly given or requiring derivation, and evaluated their calibration. The study found that with an in-context entropy source, such as a UUID, models maintained high calibration at rates below 1 in 100,000 actions. Without external entropy, some models still achieved rates lower than 1 in 10,000. Larger models showed better calibration at lower rates when target rates were provided. However, models struggled to achieve calibration without entropy or hints when they had to derive the optimal target rate independently.

Key takeaway

For AI safety researchers and evaluators, this finding indicates that relying on models' inability to perform low-probability, calibrated actions is becoming less viable. You should prioritize developing evaluation methods that can detect rare, malicious behaviors, especially as models scale and Chain-of-Thought reasoning becomes less legible. Consider incorporating adversarial testing specifically targeting low-frequency defection strategies.

Key insights

Frontier models can execute actions at extremely low, calibrated probabilities, posing a challenge for pre-deployment safety evaluations.

Principles

Method

Models were prompted to take a target action at low probabilities (e.g., 0.01%), either directly specified or requiring derivation, and their calibration was evaluated by resampling.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Researcher, AI Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.