Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days

2026-02-09 · Source: Lenny's Newsletter · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, extended

Summary

OpenAI's GPT-5.3 Codex and Anthropic's Claude Opus 4.6 coding models were rigorously tested by shipping 44 pull requests over five days, primarily focusing on a marketing site redesign and internal component refactoring. Codex, available as a desktop app, emphasizes Git primitives like repositories, branches, and worktrees, and features "skills" and "automations" for consistent task execution. However, GPT-5.2 Codex demonstrated overly literal interpretation and overfitting during the marketing site redesign, struggling with creative, broad tasks. Claude Opus 4.6, tested within the Cursor desktop app, excelled at generative, greenfield work, producing a superior site redesign and effective initial refactoring. A combined workflow emerged where Opus handled initial development (80-90% completion), and Codex performed architectural review, bug detection, and code polishing, replicating a principal software engineer's role.

Key takeaway

For AI Architects and VP of Engineering evaluating new coding models, consider a dual-model strategy. Deploy Claude Opus 4.6 for initial product development and feature implementation, leveraging its strength in generative and creative tasks. Subsequently, integrate GPT-5.3 Codex for rigorous code review, architectural validation, and identifying edge cases, treating it as a "principal engineer" for hardening code quality. This approach can significantly accelerate development cycles and improve code robustness, despite the higher token costs associated with advanced models.

Key insights

Combining Claude Opus 4.6 for generative coding with GPT-5.3 Codex for code review optimizes development workflows.

Principles

Models overfit to literal prompts.
Generative models excel at greenfield work.
Review models identify architectural flaws.

Method

Use Opus 4.6 for initial feature implementation and broad redesigns, then employ Codex for architectural review, bug detection, and code polishing before shipping to production.

In practice

Redesign marketing sites with Opus 4.6.
Refactor complex components using Opus.
Use Codex for final code review and bug fixes.

Topics

AI Code Generation
GPT-5.3 Codex
Claude Opus 4.6
Software Development Workflows
Code Review

Best for: AI Architect, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Lenny's Newsletter.