I'm trying to deploy a pre-trained PyTorch model to SageMaker using the Python SDK. I have a model.tar.gz file that is uploaded to S3, with the following structure:
code/
code/requirements.txt
code/inference.py
code/utils.py
model.pt
I also have the following deployment script (edited to remove ARNs etc, but I can confirm these are correct):
import os
import json
import sagemaker
from sagemaker.pytorch import PyTorchModel
role = AWS_SAGEMAKER_ROLE_ARN
bucket =
session = sagemaker.Session(default_bucket=bucket)
model_data = f"s3://{bucket}/model.tar.gz"
model = PyTorchModel(
model_data=model_data,
role=role,
framework_version="2.6",
py_version="py312",
#entry_point="inference.py",
sagemaker_session=session,
name=sagemaker-test-model
)
predictor = model.deploy(
instance_type="ml.m5.xlarge",
initial_instance_count=1,
endpoint_name=sagemaker-test-model-endpoint,
)
payload = {
"images": [PATH_TO_IMAGES_S3]
}
response = predictor.predict(json.dumps(payload))
print(response)
The above code time outs:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message "Your invocation timed out while waiting for a
response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."
If I un-comment entry_point in PyTorchModel, Sagemaker tries to re-upload a model.tar.gz file to S3, which gives some permissions errors I currently can't fix due to my own set of permissions errors.
My question is: am I getting a timeout because I need to provide entry_point, despite creating the model.tar.gzfile, or is my error elsewhere? Perhaps in the inference.py file?