Import AI 437: Co-improving AI; RL dreams; AI labels might be annoying

2025-10-13 · Source: Import AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, AI Safety & Governance · Depth: Advanced, long

Summary

Facebook researchers advocate for "co-improving AI" over self-improving superintelligence, citing dangers from misuse and misalignment. Their paper proposes a research agenda focused on human-AI collaboration in ideation, experimentation, and joint development of safety and deployment methods to achieve "co-superintelligence." This approach aims for faster progress, greater transparency, and a human-centered focus on AI safety. Separately, DeepMind has released SIMA 2, a "Scalable Instructable Multiworld Agent" built on a Gemini-class model, fine-tuned with gameplay data. SIMA 2 demonstrates strong generalization across unseen 3D environments, including self-improvement capabilities in games like ASKA, where a Gemini model sets tasks and scores performance to generate fine-tuning data. Additionally, a new Unreal Engine 5 simulator called SimWorld has been developed by multiple universities, offering a graphically rich, procedural, and scriptable environment for training and testing AI agents, integrating text-to-3D models for dynamic asset generation.

Key takeaway

For CTOs and VPs of Engineering evaluating AI development strategies, consider prioritizing human-AI co-improvement frameworks to mitigate risks associated with autonomous superintelligence. Your teams should explore advanced simulation environments like SimWorld and agent architectures such as SIMA 2, which combine frontier models with reinforcement learning and self-improvement mechanisms. This approach offers a scalable path for developing generalist agents and robots, but be mindful of the significant compliance costs that even seemingly simple AI policies, like labeling, can impose on your operations.

Key insights

Human-AI co-improvement offers a safer path to superintelligence than autonomous self-improvement.

Principles

Co-improvement yields faster progress and greater transparency.
Policy complexity often hides beneath simple regulatory ideas.
Reinforcement learning benefits from frontier model world models.

Method

SIMA 2 fine-tunes a Gemini-class model with gameplay data, using a self-improving scaffold where a Gemini model sets tasks, scores performance, and generates fine-tuning data for iterative agent improvement.

In practice

Use SimWorld for training RL agents in realistic 3D environments.
Integrate text-to-3D models for dynamic asset generation in simulations.
Apply self-improvement scaffolds for agent skill acquisition.

Topics

AI Safety
AI Policy
AI Simulators
Reinforcement Learning Agents
Generalist AI Models

Code references

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Researcher, Machine Learning Engineer, Policy Maker

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Import AI.