Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn
Summary
Jassi Pannu, Assistant Professor at Johns Hopkins, discusses the escalating biosecurity risks posed by advanced AI models and the proliferation of biological data. The conversation highlights how current AI capabilities, such as those demonstrated by Anthropic's Opus 4.6 and Andrej Karpathy's AutoResearch framework, enable models to troubleshoot lab experiments and autonomously make research progress, potentially exploiting publicly available biological data like the smallpox sequence. Pannu advocates for a Biosecurity Data Level (BDL) framework, mirroring existing biosafety levels, to control access to a small, critical subset (estimated 1%) of functional biological data that links pathogen sequences to dangerous properties like transmissibility or immune evasion. This approach aims to preserve open access for the vast majority of biological data while restricting future data generation on pandemic-capable pathogens to prevent misuse by extremist groups or lone actors, who are considered a greater threat than nation states in this domain.
Key takeaway
For CTOs and VPs of Engineering/Data evaluating AI strategy in biotech, recognize that current frontier models can autonomously exploit biological data. Your teams should prioritize implementing data access controls, especially for functional pathogen data, to prevent AI systems from learning dangerous capabilities. Consider adopting a Biosecurity Data Level framework and investing in secure research environments to manage risk while fostering beneficial research, ensuring that models trained on sensitive data are not publicly disseminated.
Key insights
Controlling access to critical biological data is essential to mitigate AI-driven biosecurity risks from dangerous pathogen design.
Principles
- AI model capabilities are directly influenced by training data.
- Biosecurity requires a layered, defense-in-depth strategy.
- Data controls can enable differential privileging for defensive use cases.
Method
Implement a five-tiered Biosecurity Data Level (BDL) framework, from BDL0 (open access) to BDL4 (highest control), for biological data. This involves restricting access to functional data linking pathogens to dangerous properties, often via Trusted Research Environments.
In practice
- Exclude human-infecting viral sequences from biofoundation model training.
- Utilize Trusted Research Environments for sensitive biological data.
- Support mandatory pre-synthesis screening by DNA manufacturers.
Topics
- Biosecurity
- AI Safety
- Biological Data Control
- Gain-of-Function Research
- Biofoundation Models
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, Policy Maker, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Cognitive Revolution.