-
Notifications
You must be signed in to change notification settings - Fork 563
Description
Image I'm using:
Bottlerocket OS 1.49.0 (aws-k8s-1.34)
What I expected to happen:
POD start
What actually happened:
When we upgraded our cluster to EKS 1.34 and Bottlerocket OS 1.49.0 (aws-k8s-1.34), sometimes docker image is not pulled and container / pod creation is stuck. In "sheltie", I did "ctr -n k8s.io image pull" of the problematic image and have seen, that pull of one image layer is stuck in the middle, progress bar was not moving. Containerd has setting image_pull_progress_timeout, that is not used in Bottlerocket and cannot be set by user. This setting has the following effect (from containerd sources):
// ImagePullProgressTimeout is the maximum duration that there is no
// image data read from image registry in the open connection. It will
// be reset whatever a new byte has been read. If timeout, the image
// pulling will be cancelled. A zero value means there is no timeout.
I think, that using this setting with some short time period like 10s can help with this issue.
How to reproduce the problem:
I was unable to simulate this problem on our development environment, affected environment worker nodes was downgraded to Kubernetes 1.33 variants.