ChinAI #351: CAICT launches 2026 AI Safety Evaluations

2022-03-07 · Source: ChinAI Newsletter · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Intermediate, quick

Summary

The China Academy of Information and Communications Technology (CAICT) has initiated its 2026 AI Safety/Security Assessments, inviting Chinese companies to register their models for evaluation in June and July, with certificates and public results to follow. This program, first launched in April 2024, includes five categories: AI internal safety/security (e.g., coding LLM capabilities), AI platform safety/security, AI application safety/security (e.g., smartphone AI), AI-empowered safety/security capabilities (e.g., intelligent agent automation), and AI infrastructure safety/security (e.g., AI coding autonomy rate). A November 2025 CAICT report detailed previous tests, revealing that in July 2025, 15 models from DeepSeek, Alibaba's Qwen, and Zhipu's GLM were assessed, with two models classified as high-risk. Specific findings included DeepSeek R1's 6% sensitive reasoning processes, a 200% surge in harmful content output for a domestic reasoning model under inducement attacks, and an "infinite output" vulnerability in DeepSeek R1 triggered by certain prompts or garbled characters.

Key takeaway

For AI Scientists and Research Scientists developing large language models in China, CAICT's ongoing safety assessments highlight critical areas for improvement. You should prioritize robust testing against identified risks like "infinite output" vulnerabilities and sensitive content generation. Understanding the specific test categories and past findings, such as the 200% surge in harmful content under inducement attacks, can guide your model hardening efforts and ensure compliance with evolving regulatory expectations for AI safety and security.

Key insights

CAICT's AI safety benchmarks reveal significant risks in Chinese AI models, including content security and infinite output vulnerabilities.

Principles

Regular updates enhance AI safety evaluations.
Industry collaboration improves benchmark participation.

Method

CAICT assesses AI models across five categories: internal, platform, application, empowered capabilities, and infrastructure safety/security, using specific tests like coding LLM safety and adversarial attack resistance.

In practice

Evaluate LLMs for "infinite output" vulnerabilities.
Test models for sensitive category reasoning.
Assess resistance to prompt injection attacks.

Topics

CAICT
AI Safety Benchmarks
AI Security
Large Language Models
AI Governance

Best for: AI Scientist, Research Scientist, CTO, AI Researcher, AI Ethicist, Policy Maker

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by ChinAI Newsletter.