TAI #202: GPT-5.5 Moves Codex Into Real Work

2024-09-10 · Source: Towards AI Newsletter · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, long

Summary

OpenAI recently released GPT-5.5, alongside workspace agents in ChatGPT and Privacy Filter for PII redaction, signaling a shift by frontier AI labs towards developing models as integrated work systems. GPT-5.5, particularly within its Codex environment, is designed for complex computer tasks like coding, research, document analysis, and software operation. Benchmarks show GPT-5.5 achieving 82.7% on Terminal-Bench 2.0 and 73.1% on Expert-SWE eval, though it trails Claude Opus 4.7 on SWE-Bench Pro at 58.6%. The model features a 400K context window in Codex and 1,050,000 tokens in the API, with costs at $5 per million input tokens and $30 per million output tokens. OpenAI reports significant internal adoption, with over 85% of employees using Codex weekly, and external partnerships with major consulting firms.

Key takeaway

For AI Architects and CTOs evaluating new agentic AI platforms, focus on the "work loop" capabilities of models like GPT-5.5 within Codex. Prioritize systems that offer robust tool integration, version control, audit logs, and structured memory (skills/subagents) to enable measurable, finished work rather than just generating more human cleanup tasks. Your teams should develop methodologies for creating and sharing effective skills company-wide, ensuring agents move beyond being expensive text boxes to becoming powerful, delegated workers.

Key insights

Frontier AI models are evolving into integrated work systems, emphasizing tools, memory, and permissions over raw model capability.

Principles

Measure AI value by completed workflows, not token counts.
Keep product rules in code, not solely in prompts.
Agent deployment requires controlled workspaces and scoped permissions.

Method

Utilize subagents for parallel workstreams, allowing specialized agents to explore, fact-check, criticize, and iterate before human review, shifting from single-threaded prompting to team management.

In practice

Create reusable skills for research, reporting, or data cleanup.
Run up to 20 subagents for deep research or parallel tasks.
Turn on the non-coder view in Codex for a general work surface.

Topics

GPT-5.5
OpenAI Codex
AI Agents
Agentic Coding
LLM Benchmarking

Code references

Best for: CTO, AI Architect, Investor, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI Newsletter.