LLM 0.32a0 is a major backwards-compatible refactor

2026-04-29 · Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

LLM 0.32a0, an alpha release of the LLM Python library and CLI tool, introduces significant backwards-compatible changes to better handle the diverse input and output types of modern large language models. Released on April 29, 2026, this update refactors the library's core to move beyond a simple text-in, text-out model. Key changes include representing model inputs as a sequence of messages, aligning with conversational APIs like OpenAI's chat completions, and modeling model responses as a stream of differently typed parts, accommodating mixed content such as text, tool calls, and even multi-modal outputs like images or audio. The update also provides a new mechanism for serializing and deserializing responses as JSON-style dictionaries, offering greater flexibility for Python API users.

Key takeaway

For NLP Engineers building conversational or multi-modal LLM applications, LLM 0.32a0 simplifies handling complex interactions. You can now directly feed entire conversation histories as message sequences and process rich, streaming outputs that interleave text, tool calls, and other data types. This update reduces boilerplate for integrating advanced LLM features and offers a flexible serialization mechanism for custom storage solutions.

Key insights

LLM 0.32a0 refactors input to message sequences and output to typed streams, enhancing multi-modal and tool-use capabilities.

Principles

LLM APIs should reflect conversational turns.
Model outputs can be a stream of mixed types.

Method

Model inputs are now sequences of `user()` and `assistant()` messages. Responses stream as `event.type` parts (text, tool_call_name, tool_call_args) for granular processing and display.

In practice

Use `model.prompt(messages=[user("..."), assistant("...")])` for conversational input.
Iterate `response.stream_events()` to process mixed output types.
Employ `response.to_dict()` for custom response persistence.

Topics

LLM 0.32a0 Refactor
Python LLM Library
Message-based Prompts
Multi-modal Output Streaming
LLM Tool Calling

Code references

Best for: NLP Engineer, AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.