Shafi Goldwasser Provides 'A Cryptographic Perspective on Trustworthy AI'
Summary
Shafi Goldwasser, a pioneer in modern cryptography, presented a cryptographic perspective on trustworthy AI, emphasizing the rapid adoption of machine learning models like large language models (LLMs) and the critical need for their reliability and trustworthiness. She outlined a cryptographic recipe for addressing AI trust issues: modeling adversaries, defining sufficient security, proposing solutions, and providing proofs, often based on computational hardness assumptions. Goldwasser detailed five key challenges: privacy in training data, verification of model properties (correctness, fairness, robustness), robustness against adversarial data shifts and backdoors, alignment with human values, and ownership to prevent model stealing. She highlighted the use of homomorphic encryption and trusted hardware for privacy, interactive proofs for verification, and cryptographic techniques like time-lock puzzles for alignment and robustness against backdoors, noting that simple filtering is often insufficient for safety.
Key takeaway
For AI scientists and developers building or deploying machine learning models in sensitive applications, you should adopt a security-first mindset by integrating cryptographic principles. Focus on defining adversarial models and proving the trustworthiness of your AI systems, rather than relying solely on empirical accuracy. Consider using techniques like homomorphic encryption for data privacy during training and developing self-proving models to ensure verifiable correctness, especially for critical outputs. Be aware that simple filtering mechanisms are often inadequate for ensuring model alignment and safety against sophisticated "jailbreak" attacks.
Key insights
Cryptography offers a robust framework for defining, achieving, and proving trustworthiness in AI systems against adversarial threats.
Principles
- Trust in AI requires adversarial modeling and provable security.
- Worst-case guarantees are superior to average-case accuracy for critical AI applications.
- Simple input/output filtering is insufficient for AI safety and alignment.
Method
The cryptographic recipe for trustworthy AI involves modeling adversaries, defining security, proposing solutions (e.g., homomorphic encryption, interactive proofs), and providing mathematical proofs of security under computational assumptions.
In practice
- Utilize homomorphic encryption for privacy-preserving ML training.
- Implement verifier-guided reinforcement learning for self-proving models.
- Post-process models to mitigate undetectable backdoors.
Topics
- Trustworthy AI
- Cryptographic AI
- Homomorphic Encryption
- AI Model Robustness
- LLM Alignment
Best for: AI Scientist, AI Researcher, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by MIT CSAIL.