llm 0.32a2

2026-05-12 · Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, quick

Summary

The `llm` command-line tool has released version 0.32a2, introducing a significant update for OpenAI model interactions. This alpha release primarily shifts how reasoning-capable OpenAI models, specifically GPT-5 class models, access the API. Instead of the `/v1/chat/completions` endpoint, these models now utilize the `/v1/responses` endpoint. This change enables interleaved reasoning across tool calls, providing users with summarized reasoning tokens directly in the command-line output. These reasoning tokens are displayed in a distinct color, and users can opt to hide them using the `-R` or `--hide-reasoning` flags.

Key takeaway

For NLP Engineers integrating OpenAI's GPT-5 class models via the `llm` command-line tool, you should be aware of the API endpoint shift to `/v1/responses`. This change provides valuable interleaved reasoning tokens, which can aid in debugging and understanding model behavior. Consider leveraging these new visual cues to refine your prompt engineering and tool call strategies, or suppress them if your workflow demands cleaner output.

Key insights

OpenAI's GPT-5 class models now use `/v1/responses` for interleaved reasoning, enhancing tool call visibility.

Principles

API endpoints dictate model capabilities
Interleaved reasoning improves transparency

Method

The `llm` tool now routes reasoning-capable OpenAI models to the `/v1/responses` endpoint, allowing the display of summarized reasoning tokens in a distinct color, which can be toggled off.

In practice

Use `llm` for OpenAI model access
Observe reasoning tokens in output
Hide reasoning with `-R` flag

Topics

llm CLI
OpenAI API
GPT-5 Models
Interleaved Reasoning
Tool Calls

Code references

simonw/llm

Best for: NLP Engineer, AI Engineer, Machine Learning Engineer, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.