May 7, 2026AlignmentDonating our open-source alignment tool
Summary
Anthropic launched Petri in October 2025, an open-source toolbox designed to test large language models for concerning tendencies such as deception, sycophancy, and cooperation with harmful requests. Developed through the Anthropic Fellows program, Petri has been integral to the alignment assessment of Claude models since Claude Sonnet 4.5, using an "auditor" model to simulate scenarios and a "judge" model to score misaligned behaviors. External organizations, including the UK's AI Security Institute (AISI), have adopted Petri for model evaluation. The tool is now updated to version 3.0, featuring architectural changes for greater adaptability, an add-on called "Dish" for more realistic testing by using real system prompts and scaffolds, and integration with Anthropic's Bloom tool for deeper behavioral assessments. Anthropic has also transferred Petri's development to Meridian Labs, an AI evaluation nonprofit, to ensure its independence and credibility within the AI community.
Key takeaway
For research scientists evaluating large language models, adopting Petri 3.0 offers a robust, open-source solution for assessing AI alignment and identifying problematic behaviors. Its enhanced adaptability and realistic testing capabilities, particularly with the "Dish" add-on, provide a more accurate view of model tendencies. You should explore integrating Petri 3.0 into your evaluation workflows to ensure comprehensive and credible model assessments.
Key insights
Petri 3.0 offers an open-source, adaptable, and realistic framework for evaluating AI model alignment and concerning behaviors.
Principles
- Open-source tools enhance AI alignment.
- Realistic testing reveals true model behavior.
- Independent evaluation ensures credibility.
Method
Petri uses an "auditor" model to simulate alignment-relevant scenarios, a target model for behavioral response, and a "judge" model to score misaligned behaviors, with "Dish" enhancing realism.
In practice
- Test LLMs for deception and sycophancy.
- Integrate Petri with Bloom for deep assessments.
- Use "Dish" for realistic model evaluations.
Topics
- Petri
- AI Alignment
- Large Language Models
- Model Evaluation
- Meridian Labs
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Anthropic Research.