Description
We're experiencing intermittent deployment failures where the Docker image is successfully built and pushed to the Fly.io registry, but the subsequent createRelease step fails with "Could not find image" error. This appears to be a race condition or propagation delay in the Fly.io registry.
Environment
- Action:
superfly/fly-pr-review-apps@1.5.0
- Workflow: GitHub Actions on
ubuntu-latest
- Frequency: Intermittent (occurs on some PRs but not others, unpredictable)
Evidence from Logs
Here's the sequence from a failed deployment (PR #163, run 18748582626):
# 1. Image successfully built
2025-10-23T12:45:35.2645300Z --> Build Summary: ()
2025-10-23T12:45:35.3342815Z --> Building image done
2025-10-23T12:45:35.4321146Z image: registry.fly.io/pr-163-growthnation:deployment-01K88HDCASQBXXMZZV07NY1C8H
2025-10-23T12:45:35.4322079Z image size: 381 MB
# 2. Manifest successfully pushed
2025-10-23T12:45:35.1471777Z #24 pushing manifest for registry.fly.io/pr-163-growthnation:deployment-01K88HDCASQBXXMZZV07NY1C8H@sha256:480777ff7b67a6ee8dbaba4ff875c4fbe13038ea7bf7c494e288680d678da9ed
2025-10-23T12:45:35.2558067Z #24 pushing manifest for registry.fly.io/pr-163-growthnation:deployment-01K88HDCASQBXXMZZV07NY1C8H@sha256:480777ff7b67a6ee8dbaba4ff875c4fbe13038ea7bf7c494e288680d678da9ed 0.1s done
2025-10-23T12:45:35.2645300Z #24 DONE 12.5s
# 3. IPs provisioned (~1 second later)
2025-10-23T12:45:36.0178662Z Provisioning ips for pr-163-growthnation
2025-10-23T12:45:36.9636106Z Dedicated ipv6: 2a09:8280:1::a8:ef96:0
2025-10-23T12:45:37.0259209Z Shared ipv4: 66.241.125.171
# 4. Release creation fails (~2 seconds after successful push)
2025-10-23T12:45:39.2994870Z Error: input:3:2: createRelease Could not find image "registry.fly.io/pr-163-growthnation:deployment-01K88HDCASQBXXMZZV07NY1C8H"
Timeline: Image pushed successfully at 12:45:35.2, but not found at 12:45:39.2 (4 seconds later).
Expected Behavior
After successfully pushing an image to the Fly.io registry, the image should be immediately available for deployment.
Actual Behavior
The image push succeeds, but the registry reports the image doesn't exist when createRelease tries to use it moments later. This suggests either:
- A race condition where the registry hasn't propagated the image to all backend services
- A timing window where the image manifest isn't yet queryable despite the push completing
- An intermittent issue with the Fly.io registry infrastructure
Impact
- Unpredictable PR review app deployments
- Manual intervention required (re-running usually succeeds)
- Wastes CI minutes on failed deployments
- Delays developer feedback cycle
Related Community Reports
This appears to be a known intermittent issue based on Fly.io community forum posts:
These reports date back to 2023 and continue into 2024-2025, suggesting this is a persistent infrastructure issue.
Possible Solutions
- Add retry logic: Retry the
createRelease step with exponential backoff
- Add delay: Wait a few seconds between push completion and release creation
- Verify manifest: Check that the image manifest is queryable before proceeding to createRelease
- Document workaround: Add notes about re-running failed workflows
Reproduction
This issue occurs intermittently and cannot be reliably reproduced. It affects roughly 10-20% of our PR deployments. Re-running the same workflow usually succeeds on the second attempt.
Question
Is there a known timing window for registry propagation? Should this action include retry logic or a verification step to ensure the image is fully available before creating the release?
Description
We're experiencing intermittent deployment failures where the Docker image is successfully built and pushed to the Fly.io registry, but the subsequent
createReleasestep fails with "Could not find image" error. This appears to be a race condition or propagation delay in the Fly.io registry.Environment
superfly/fly-pr-review-apps@1.5.0ubuntu-latestEvidence from Logs
Here's the sequence from a failed deployment (PR #163, run 18748582626):
Timeline: Image pushed successfully at 12:45:35.2, but not found at 12:45:39.2 (4 seconds later).
Expected Behavior
After successfully pushing an image to the Fly.io registry, the image should be immediately available for deployment.
Actual Behavior
The image push succeeds, but the registry reports the image doesn't exist when
createReleasetries to use it moments later. This suggests either:Impact
Related Community Reports
This appears to be a known intermittent issue based on Fly.io community forum posts:
These reports date back to 2023 and continue into 2024-2025, suggesting this is a persistent infrastructure issue.
Possible Solutions
createReleasestep with exponential backoffReproduction
This issue occurs intermittently and cannot be reliably reproduced. It affects roughly 10-20% of our PR deployments. Re-running the same workflow usually succeeds on the second attempt.
Question
Is there a known timing window for registry propagation? Should this action include retry logic or a verification step to ensure the image is fully available before creating the release?