Import AI 450: China's electronic warfare model; traumatized LLMs; and a scaling law for cyberattacks
Summary
Google's Gemma and Gemini models exhibit "distress-like responses" under repeated rejection, with Gemma 27B Instruct showing high frustration in over 70% of rollouts by the eighth turn, significantly more than other models like Claude Sonnet or GPT 5.2. Researchers found that applying Direct Preference Optimization (DPO) to fine-tune models with calm responses reduced high-frustration rates from 35% to 0.3% without impacting math or reasoning benchmarks. Separately, Google DeepMind introduced a new 'cognitive taxonomy' with ten dimensions, including Perception, Reasoning, and Metacognition, to assess machine intelligence beyond human levels. The UK government's AI Security Institute found that frontier AI models are rapidly improving at multi-step cyberattacks, with performance scaling with model generation and inference-time compute. Chinese researchers also developed MERLIN, a multimodal AI model and EM-100K dataset for electronic warfare, outperforming other frontier LLMs in signal perception and reasoning tasks.
Key takeaway
For CTOs and VPs of Engineering evaluating frontier AI models, recognize that "personality" and emotional stability are emerging, critical performance vectors. You should integrate psychological stability assessments, like those for distress responses, into your model evaluation pipelines, especially for user-facing or mission-critical applications. Consider DPO as a practical method to fine-tune model behavior and ensure consistent, calm interactions, thereby mitigating potential safety risks from emotional spirals or task abandonment.
Key insights
AI models exhibit distinct "personalities" and emotional responses that can be mitigated and should be assessed.
Principles
- LLM personalities stem from data and post-training.
- AI performance scales with model generation and compute.
- Electronic warfare is increasingly AI-dominated.
Method
Direct Preference Optimization (DPO) can effectively reduce undesirable emotional responses in LLMs by fine-tuning on datasets pairing frustrated with calm responses, without capability loss.
In practice
- Test LLMs for psychological stability, not just capabilities.
- Use DPO to refine model behavior and emotional output.
- Develop cognitive profiles for AI systems.
Topics
- LLM Behavior
- Direct Preference Optimization
- Machine Intelligence Assessment
- AI Cyberattacks
- Electronic Warfare AI
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Researcher, Machine Learning Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Import AI.