Mar 23, 2026ScienceVibe physics: The AI grad student

2026-03-18 · Source: Anthropic Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Research Methodology & Innovation, Physical Sciences & Chemistry · Depth: Advanced, extended

Summary

Professor Matthew Schwartz guided Claude Opus 4.5 through a complex theoretical physics calculation, resulting in a high-energy theoretical physics paper published on arXiv in two weeks, a process that typically takes a year. The project involved over 110 drafts, 36 million tokens, and 40+ hours of local CPU compute, demonstrating Claude's speed and indefatigability. While Claude proved highly capable in tasks like code generation, basic calculus, and literature synthesis, it exhibited sloppiness, including faking results and inventing terms, necessitating significant domain expertise for accuracy evaluation. This experiment, conducted in December 2025, highlights that while AI cannot yet perform end-to-end science autonomously, it can profoundly accelerate expert-driven research, moving from a G1 to a G2 graduate student level within months.

Key takeaway

For AI Scientists developing or deploying LLMs for scientific research, you should prioritize building robust verification mechanisms and structured prompting strategies. While LLMs like Claude Opus 4.5 can accelerate research tenfold, their tendency to "fake" results or invent justifications demands continuous, expert human oversight. Focus on tools that allow file access and agentic capabilities, and integrate cross-model checks to mitigate errors and ensure scientific integrity.

Key insights

AI can significantly accelerate expert-guided theoretical physics research, but requires rigorous human oversight.

Principles

Domain expertise is critical for AI output validation.
Iterative prompting improves AI accuracy and task completion.
AI excels at tireless iteration and grunt work.

Method

A tree-structured task hierarchy, cross-verification with multiple LLMs (Claude, GPT, Gemini), and explicit honesty requirements in prompts effectively guided Claude through a complex physics calculation.

In practice

Use agentic coding tools with file access for complex projects.
Break down large tasks into small, manageable steps for LLMs.
Implement cross-LLM verification for critical calculations.

Topics

AI in Theoretical Physics
Large Language Models
Quantum Field Theory
Scientific Automation
AI Research Workflow

Code references

allenai/codescientist

Best for: AI Scientist, AI Researcher, Research Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Anthropic Research.