Import AI 437: Co-improving AI; RL dreams; AI labels might be annoying
Summary
Facebook researchers advocate for "co-improving AI" over self-improving superintelligence, citing dangers from misuse and misalignment. Their paper proposes a research agenda focused on human-AI collaboration in ideation, experimentation, and joint development of safety and deployment methods to achieve "co-superintelligence." This approach aims for faster progress, greater transparency, and a human-centered focus on AI safety. Separately, DeepMind has released SIMA 2, a "Scalable Instructable Multiworld Agent" built on a Gemini-class model, fine-tuned with gameplay data. SIMA 2 demonstrates strong generalization across unseen 3D environments, including self-improvement capabilities in games like ASKA, where a Gemini model sets tasks and scores performance to generate fine-tuning data. Additionally, a new Unreal Engine 5 simulator called SimWorld has been developed by multiple universities, offering a graphically rich, procedural, and scriptable environment for training and testing AI agents, integrating text-to-3D models for dynamic asset generation.
Key takeaway
For CTOs and VPs of Engineering evaluating AI development strategies, consider prioritizing human-AI co-improvement frameworks to mitigate risks associated with autonomous superintelligence. Your teams should explore advanced simulation environments like SimWorld and agent architectures such as SIMA 2, which combine frontier models with reinforcement learning and self-improvement mechanisms. This approach offers a scalable path for developing generalist agents and robots, but be mindful of the significant compliance costs that even seemingly simple AI policies, like labeling, can impose on your operations.
Key insights
Human-AI co-improvement offers a safer path to superintelligence than autonomous self-improvement.
Principles
- Co-improvement yields faster progress and greater transparency.
- Policy complexity often hides beneath simple regulatory ideas.
- Reinforcement learning benefits from frontier model world models.
Method
SIMA 2 fine-tunes a Gemini-class model with gameplay data, using a self-improving scaffold where a Gemini model sets tasks, scores performance, and generates fine-tuning data for iterative agent improvement.
In practice
- Use SimWorld for training RL agents in realistic 3D environments.
- Integrate text-to-3D models for dynamic asset generation in simulations.
- Apply self-improvement scaffolds for agent skill acquisition.
Topics
- AI Safety
- AI Policy
- AI Simulators
- Reinforcement Learning Agents
- Generalist AI Models
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Researcher, Machine Learning Engineer, Policy Maker
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Import AI.