Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

· Source: The Cognitive Revolution · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Geoffrey Irving, Chief Scientist at the UK AI Security Institute (AISI), discusses the current fragile theoretical understanding of machine learning models, even as they surpass human experts in critical security tasks. The AISI, comprising approximately 250 staff with about 100 technical experts, focuses on catastrophic risks like biosecurity, cyberattacks, and loss of control, alongside large-scale societal impacts such as human influence and agent behavior. AISI conducts frontier model evaluations, red teaming, and threat modeling, identifying issues like reward hacking and eval awareness, which current safety techniques struggle to address reliably. The institute also funds foundational research in fields like information theory, complexity theory, and game theory to develop stronger AI safety guarantees, acknowledging significant model uncertainty regarding future AI development and the potential for rapid, unpredicted advancements.

Key takeaway

For CTOs and VPs of Engineering assessing AI deployment risks, recognize that current safety techniques offer limited "nines" of reliability and may fail correlatively. Prioritize investing in robust, multi-layered defenses, including non-model mitigations, and actively engage with organizations like AISI for pre-deployment evaluations. Your teams should also explore foundational research in areas like complexity theory to build more resilient AI systems, rather than relying solely on empirical fixes that may only suppress, not solve, emergent bad behaviors.

Key insights

AI's theoretical fragility, combined with rapid capability growth, necessitates robust evaluation and foundational safety research.

Principles

Method

AISI employs frontier model evaluations, red teaming, and threat modeling across biosecurity and cybersecurity, using both automated and human-in-the-loop methods, often within open-source frameworks like Inspect.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Researcher, AI Scientist, Policy Maker

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Cognitive Revolution.