Ted Moskovitz of Anthropic at RAAIS 2026

2025-10-09 · Source: Air Street Press · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, short

Summary

Ted Moskovitz, who leads Anthropic's Science of Scaling team, is announced as a speaker at the 10th annual Research and Applied AI Summit (RAAIS) on June 12th, 2026, in London. His research focuses on the intersection of reinforcement learning, optimization, and large-scale deep learning, particularly addressing how AI model capabilities generalize and maintain reliability as they scale. Moskovitz's work investigates issues like "reward model overoptimization," where models achieve high scores but human evaluation of outputs deteriorates. His ICLR 2024 Spotlight paper, "Confronting Reward Model Overoptimization with Constrained RLHF," proposes a solution by treating objectives as constraints rather than maximization targets to keep models aligned with human judgment. Other notable works include "ReLOAD" (ICML 2023) and "Towards an Understanding of Default Policies in Multitask Policy Optimization" (AISTATS 2022 Best Paper Award Honorable Mention), all emphasizing robust behavior under extreme optimization.

Key takeaway

For AI Scientists and Machine Learning Engineers developing large-scale models, understanding optimization's failure modes is critical. You should prioritize methods that ensure model reliability and human alignment as capabilities scale. Consider implementing constrained reinforcement learning techniques, like those proposed by Moskovitz, to prevent reward signals from decoupling from actual human judgment. This approach helps maintain robust behavior and generalization, crucial for production-grade AI systems.

Key insights

Ensuring AI model reliability and alignment with human judgment is crucial as optimization and scaling advance.

Principles

Model capabilities must generalize reliably with scale.
Optimization can quietly break human alignment.
Treat objectives as constraints, not just maximizers.

Method

The "Confronting Reward Model Overoptimization with Constrained RLHF" paper proposes treating multiple training objectives as constraints to satisfy, rather than a single score to maximize, to maintain human alignment during scaling.

In practice

Implement constrained RLHF for alignment.
Monitor human judgment during scaling.
Evaluate generalization across tasks.

Topics

Anthropic
AI Scaling
Reinforcement Learning
Constrained Optimization
Model Alignment
Reward Model Overoptimization

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Air Street Press.