Amazon SageMaker AI Async Inference now supports inline request payloads
Summary
Amazon SageMaker AI Async Inference now supports inline request payloads, allowing customers to send inference data directly within the "InvokeEndpointAsync" API request body. This enhancement eliminates the previous requirement to upload input data to Amazon S3 for payloads up to 128,000 bytes. The new "Body" parameter simplifies client-side code, removes an entire network round-trip, and reduces the operational surface area for asynchronous inference workloads. This feature is particularly beneficial for small payloads (in KB) requiring longer processing times than real-time inference. It is available in 31 commercial AWS Regions and is backward-compatible with existing async endpoints, requiring no model or container changes. The output behavior remains unchanged, with results written to a configured S3 output location.
Key takeaway
For AI/ML Engineers managing SageMaker Async Inference workloads, if your models process small payloads (up to 128,000 bytes), you should update your AWS SDK and switch to the new "Body" parameter. This change will simplify your client-side code, reduce latency by removing S3 upload steps, and lower operational costs. Consider branching your invocation logic to use inline payloads for smaller inputs and S3 "InputLocation" for larger ones to optimize performance and architecture.
Key insights
SageMaker Async Inference now accepts inline payloads up to 128,000 bytes, simplifying workflows and reducing latency.
Principles
- Minimize network hops for efficiency.
- Reduce dependencies to simplify architecture.
- Synchronous error feedback improves debugging.
Method
Update AWS SDK (Boto3), replace S3 upload with "Body" parameter in "InvokeEndpointAsync" call, then test and verify output in S3.
In practice
- Use "Body" for payloads ≤128KB.
- Use "InputLocation" for larger payloads.
- Branch logic for mixed payload sizes.
Topics
- Amazon SageMaker
- Asynchronous Inference
- AWS SDK Boto3
- Payload Management
- Cloud Architecture
- API Integration
Code references
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.