The inaugural Redwood Research podcast

2026-01-04 · Source: AI Alignment Forum · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

The inaugural Redwood Research podcast, featuring Buck and Ryan, offers a 170-minute discussion on AI alignment, the history of Redwood Research, and future AI risks. The podcast covers diverse topics including their personal P(doom) estimates (50% catastrophic outcomes), the importance of considering multiverse theories and simulation hypotheses, and the challenges of AI control. They detail Redwood Research's evolution from adversarial robustness and interpretability research to its current focus on AI control and alignment faking. A significant portion addresses the practicalities of video editing using Claude Code, the impact of their past research, and their evolving perspectives on AI safety strategies, including different "Plan" scenarios based on varying levels of political will and company commitment to safety. The discussion also touches on the economic viability of neuralese models, chain-of-thought legibility, and the role of mid-career professionals in AI safety.

Key takeaway

For research scientists evaluating AI safety strategies, prioritize developing modular, easily implementable control mechanisms that are robust to AI scheming, rather than solely focusing on complex interpretability. Your efforts should target "low-effort regimes" and leverage simple, iterative empirical methods to quickly identify effective interventions, acknowledging that AI companies may have limited capacity for highly complex safety integrations. Consider contributing to the development of clear, accessible conceptual frameworks to improve collective understanding of AI risks.

Key insights

AI safety requires pragmatic strategies, acknowledging both technical challenges and organizational realities, to manage escalating risks.

Principles

Prioritize simple, iterative research over complex, long-term projects.
Focus on AI control methods that are robust to AI scheming.
Recognize that AI companies may have limited capacity for complex safety implementations.

Method

Redwood Research developed a command-line video editing system using Deepgram for transcription and Claude Code for automated shot cutting, compiling to an ffmpeg command, to streamline podcast production without manual editing costs.

In practice

Use simple baselines and methods first in AI safety research.
Develop organizational capacity for AI incident response proactively.
Explore high-level thought decoding for neuralese models.

Topics

AI Alignment
AI Control
Interpretability Research
Misaligned AI
Neuralese Models

Best for: Research Scientist, AI Researcher, AI Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Alignment Forum.