llm 0.32a2
Summary
The `llm` command-line tool has released version 0.32a2, introducing a significant update for OpenAI model interactions. This alpha release primarily shifts how reasoning-capable OpenAI models, specifically GPT-5 class models, access the API. Instead of the `/v1/chat/completions` endpoint, these models now utilize the `/v1/responses` endpoint. This change enables interleaved reasoning across tool calls, providing users with summarized reasoning tokens directly in the command-line output. These reasoning tokens are displayed in a distinct color, and users can opt to hide them using the `-R` or `--hide-reasoning` flags.
Key takeaway
For NLP Engineers integrating OpenAI's GPT-5 class models via the `llm` command-line tool, you should be aware of the API endpoint shift to `/v1/responses`. This change provides valuable interleaved reasoning tokens, which can aid in debugging and understanding model behavior. Consider leveraging these new visual cues to refine your prompt engineering and tool call strategies, or suppress them if your workflow demands cleaner output.
Key insights
OpenAI's GPT-5 class models now use `/v1/responses` for interleaved reasoning, enhancing tool call visibility.
Principles
- API endpoints dictate model capabilities
- Interleaved reasoning improves transparency
Method
The `llm` tool now routes reasoning-capable OpenAI models to the `/v1/responses` endpoint, allowing the display of summarized reasoning tokens in a distinct color, which can be toggled off.
In practice
- Use `llm` for OpenAI model access
- Observe reasoning tokens in output
- Hide reasoning with `-R` flag
Topics
- llm CLI
- OpenAI API
- GPT-5 Models
- Interleaved Reasoning
- Tool Calls
Code references
Best for: NLP Engineer, AI Engineer, Machine Learning Engineer, Prompt Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.