Building custom model provider for Strands Agents with LLMs hosted on SageMaker AI endpoints
Summary
Organizations deploying custom large language models (LLMs) on Amazon SageMaker AI real-time endpoints often face response format incompatibility with Strands agents, despite using frameworks like SGLang, vLLM, or TorchServe. While these frameworks typically return OpenAI-compatible responses, Strands agents expect the Bedrock Messages API format, leading to parsing errors like "TypeError: 'NoneType' object is not subscriptable." This issue arises because SageMaker's flexibility allows hosting models with diverse, non-standard prompt/response formats, and the default SageMakerAIModel class cannot parse these. Although Amazon Bedrock's Mantle engine supports OpenAI formats since December 2025, SageMaker endpoints do not guarantee this. The solution involves implementing custom model parsers that extend SageMakerAIModel to translate the model server's output into the Bedrock Messages API format, enabling seamless integration with the Strands Agents SDK.
Key takeaway
For MLOps Engineers deploying custom LLMs on Amazon SageMaker AI real-time endpoints for use with Strands agents, you must implement custom model parsers. This ensures compatibility by translating your model's response format (e.g., OpenAI-compatible) into the Bedrock Messages API format expected by Strands, preventing parsing errors and enabling seamless agent integration. Leverage the provided `awslabs/ml-container-creator` and custom `SageMakerAIModel` extensions to streamline this process.
Key insights
Custom model parsers bridge LLM response format mismatches between SageMaker endpoints and Strands agents.
Principles
- SageMaker offers flexible LLM hosting.
- Strands agents expect Bedrock Messages API format.
Method
Deploy Llama 3.1 with SGLang on SageMaker using `awslabs/ml-container-creator`, then implement a custom `LlamaModelProvider` class extending `SageMakerAIModel` to parse OpenAI-compatible responses into the Bedrock Messages API format for Strands agents.
In practice
- Use `awslabs/ml-container-creator` for SageMaker BYOC.
- Extend `SageMakerAIModel` for custom parsing.
- Override the `stream()` method for response translation.
Topics
- LLM Deployment
- Amazon SageMaker
- Strands Agents SDK
- Custom Model Parsers
- Response Format Compatibility
Code references
Best for: Machine Learning Engineer, MLOps Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.