Mistral Moderation 2411 - Mistral AI
Summary
Mistral AI has deprecated its `mistral-moderation-2411` model and the associated `moderation_llm_v1` guardrail configuration. Users are instructed to migrate to the newer `mistral-moderation-2603` model and update their guardrail configurations to `moderation_llm_v2`. The updated `mistral-moderation-2603` model introduces refined policy categories, specifically replacing the "Dangerous and Criminal Content" category with distinct "Dangerous," "Criminal," and "Jailbreaking" classifications. The document details the existing policy categories, including "Sexual," "Hate and Discrimination," "Violence and Threats," "Self-Harm," "Health," "Financial," "Law," and "PII." It also provides examples of `moderation_llm_v1` guardrail configurations and the `403` error response for blocked content.
Key takeaway
For teams utilizing Mistral AI's moderation services, you must migrate from `mistral-moderation-2411` to `mistral-moderation-2603` and update your guardrail configurations to `moderation_llm_v2`. This transition is critical to ensure your content moderation policies align with the latest definitions, particularly the new "Dangerous," "Criminal," and "Jailbreaking" categories. Failing to update will leave your systems on a deprecated model, potentially impacting content safety and compliance.
Key insights
Mistral AI has deprecated `mistral-moderation-2411`, urging migration to `mistral-moderation-2603` for enhanced content policy.
Principles
- Regularly update moderation models.
- Refine policy categories for clarity.
Method
Configure custom guardrails using `moderation_llm_v1` by setting category thresholds and defining an action like "block" for content violations.
In practice
- Migrate `moderation_llm_v1` configs to `moderation_llm_v2`.
- Review new "Dangerous", "Criminal", "Jailbreaking" categories.
Topics
- Mistral Moderation 2411
- Mistral Moderation 2603
- Content Moderation
- Guardrail Configuration
- Policy Categories
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, MLOps Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by mistral.ai via Google News.