Building custom model provider for Strands Agents with LLMs hosted on SageMaker AI endpoints

2026-03-05 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Intermediate, medium

Summary

Organizations deploying custom large language models (LLMs) on Amazon SageMaker AI real-time endpoints often face response format incompatibility with Strands agents, despite using frameworks like SGLang, vLLM, or TorchServe. While these frameworks typically return OpenAI-compatible responses, Strands agents expect the Bedrock Messages API format, leading to parsing errors like "TypeError: 'NoneType' object is not subscriptable." This issue arises because SageMaker's flexibility allows hosting models with diverse, non-standard prompt/response formats, and the default SageMakerAIModel class cannot parse these. Although Amazon Bedrock's Mantle engine supports OpenAI formats since December 2025, SageMaker endpoints do not guarantee this. The solution involves implementing custom model parsers that extend SageMakerAIModel to translate the model server's output into the Bedrock Messages API format, enabling seamless integration with the Strands Agents SDK.

Key takeaway

For MLOps Engineers deploying custom LLMs on Amazon SageMaker AI real-time endpoints for use with Strands agents, you must implement custom model parsers. This ensures compatibility by translating your model's response format (e.g., OpenAI-compatible) into the Bedrock Messages API format expected by Strands, preventing parsing errors and enabling seamless agent integration. Leverage the provided `awslabs/ml-container-creator` and custom `SageMakerAIModel` extensions to streamline this process.

Key insights

Custom model parsers bridge LLM response format mismatches between SageMaker endpoints and Strands agents.

Principles

SageMaker offers flexible LLM hosting.
Strands agents expect Bedrock Messages API format.

Method

Deploy Llama 3.1 with SGLang on SageMaker using `awslabs/ml-container-creator`, then implement a custom `LlamaModelProvider` class extending `SageMakerAIModel` to parse OpenAI-compatible responses into the Bedrock Messages API format for Strands agents.

In practice

Use `awslabs/ml-container-creator` for SageMaker BYOC.
Extend `SageMakerAIModel` for custom parsing.
Override the `stream()` method for response translation.

Topics

LLM Deployment
Amazon SageMaker
Strands Agents SDK
Custom Model Parsers
Response Format Compatibility

Code references

Best for: Machine Learning Engineer, MLOps Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.