Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

2026-03-11 · Source: The Cognitive Revolution · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Intermediate, extended

Summary

Jassi Pannu, Assistant Professor at Johns Hopkins, discusses the escalating biosecurity risks posed by advanced AI models and the proliferation of biological data. The conversation highlights how current AI capabilities, such as those demonstrated by Anthropic's Opus 4.6 and Andrej Karpathy's AutoResearch framework, enable models to troubleshoot lab experiments and autonomously make research progress, potentially exploiting publicly available biological data like the smallpox sequence. Pannu advocates for a Biosecurity Data Level (BDL) framework, mirroring existing biosafety levels, to control access to a small, critical subset (estimated 1%) of functional biological data that links pathogen sequences to dangerous properties like transmissibility or immune evasion. This approach aims to preserve open access for the vast majority of biological data while restricting future data generation on pandemic-capable pathogens to prevent misuse by extremist groups or lone actors, who are considered a greater threat than nation states in this domain.

Key takeaway

For CTOs and VPs of Engineering/Data evaluating AI strategy in biotech, recognize that current frontier models can autonomously exploit biological data. Your teams should prioritize implementing data access controls, especially for functional pathogen data, to prevent AI systems from learning dangerous capabilities. Consider adopting a Biosecurity Data Level framework and investing in secure research environments to manage risk while fostering beneficial research, ensuring that models trained on sensitive data are not publicly disseminated.

Key insights

Controlling access to critical biological data is essential to mitigate AI-driven biosecurity risks from dangerous pathogen design.

Principles

AI model capabilities are directly influenced by training data.
Biosecurity requires a layered, defense-in-depth strategy.
Data controls can enable differential privileging for defensive use cases.

Method

Implement a five-tiered Biosecurity Data Level (BDL) framework, from BDL0 (open access) to BDL4 (highest control), for biological data. This involves restricting access to functional data linking pathogens to dangerous properties, often via Trusted Research Environments.

In practice

Exclude human-infecting viral sequences from biofoundation model training.
Utilize Trusted Research Environments for sensitive biological data.
Support mandatory pre-synthesis screening by DNA manufacturers.

Topics

Biosecurity
AI Safety
Biological Data Control
Gain-of-Function Research
Biofoundation Models

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, Policy Maker, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Cognitive Revolution.