TAI #202: GPT-5.5 Moves Codex Into Real Work
Summary
OpenAI recently released GPT-5.5, alongside workspace agents in ChatGPT and Privacy Filter for PII redaction, signaling a shift by frontier AI labs towards developing models as integrated work systems. GPT-5.5, particularly within its Codex environment, is designed for complex computer tasks like coding, research, document analysis, and software operation. Benchmarks show GPT-5.5 achieving 82.7% on Terminal-Bench 2.0 and 73.1% on Expert-SWE eval, though it trails Claude Opus 4.7 on SWE-Bench Pro at 58.6%. The model features a 400K context window in Codex and 1,050,000 tokens in the API, with costs at $5 per million input tokens and $30 per million output tokens. OpenAI reports significant internal adoption, with over 85% of employees using Codex weekly, and external partnerships with major consulting firms.
Key takeaway
For AI Architects and CTOs evaluating new agentic AI platforms, focus on the "work loop" capabilities of models like GPT-5.5 within Codex. Prioritize systems that offer robust tool integration, version control, audit logs, and structured memory (skills/subagents) to enable measurable, finished work rather than just generating more human cleanup tasks. Your teams should develop methodologies for creating and sharing effective skills company-wide, ensuring agents move beyond being expensive text boxes to becoming powerful, delegated workers.
Key insights
Frontier AI models are evolving into integrated work systems, emphasizing tools, memory, and permissions over raw model capability.
Principles
- Measure AI value by completed workflows, not token counts.
- Keep product rules in code, not solely in prompts.
- Agent deployment requires controlled workspaces and scoped permissions.
Method
Utilize subagents for parallel workstreams, allowing specialized agents to explore, fact-check, criticize, and iterate before human review, shifting from single-threaded prompting to team management.
In practice
- Create reusable skills for research, reporting, or data cleanup.
- Run up to 20 subagents for deep research or parallel tasks.
- Turn on the non-coder view in Codex for a general work surface.
Topics
- GPT-5.5
- OpenAI Codex
- AI Agents
- Agentic Coding
- LLM Benchmarking
Code references
Best for: CTO, AI Architect, Investor, AI Engineer, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI Newsletter.