scripts: add validate-runner-bump.py for Renovate diff validation by jeanschmidt · Pull Request #636 · pytorch/ci-infra

jeanschmidt · 2026-05-27T01:59:34Z

Stack from ghstack (oldest at bottom):

Impact: OSDC scripts — new standalone validator
Risk: low

What

Introduces a shared Python validator for Renovate-style runner-image bump
PRs. Confirms the diff touches exactly one file at exactly one hunk, that
the bumped value is strict semver, and that the new version is strictly
greater than the old.

Why

Both the pre-merge auto-approve gate and the post-merge auto-deploy
workflow need the same correctness guarantees on a Renovate diff before
they trust it. Centralizing the logic in one Python script keeps the YAML
workflows minimal and gives us proper unit-test coverage.

How

Pure-Python, no third-party deps; parses unified-diff text and the
target file path from CLI args.
Strict checks: single file, single hunk, single changed line, strict
semver on both sides, new > old.
Exit non-zero with a precise reason on any failure so callers can fail
fast and the surfaced error tells reviewers exactly what is off.
Ships with a comprehensive unit-test file covering happy path and every
rejection branch.

Changes

osdc/scripts/validate-runner-bump.py: new validator.
osdc/scripts/test_validate_runner_bump.py: unit tests.

Notes

No callers yet; wired up in later commits.

Testing

cd osdc && just test runs the new test suite.

Signed-off-by: Jean Schmidt contato@jschmidt.me

[ghstack-poisoned]

malfet

Why it needs to be so complex? All this 400 line change does is checks whether some property in osdc/clusters.yaml contains positive version bump right? Why not do something like
current, old = json.read("osdc/clusters.yaml"), json.read_string(subprocess.check_output("git show HEAD~1 osdc/clusters.yml")) and just compare the appropriate key? And you can make a very simple gh api call to figure out whether diff is 1 line, can't you?

malfet · 2026-05-27T14:24:54Z

+
+    if len(files) != 1:
+        _emit("close-wrong-file-count", f"expected exactly 1 file changed; got {len(files)}")
+        return 0


Why wrong file count is not an error?

malfet · 2026-05-27T14:26:41Z

+
+EXPECTED_FILE = "osdc/clusters.yaml"
+
+# MUST remain bit-for-bit equivalent to the prior bash regex so the pre-


MUST remain bit-for-bit equivalent to the prior bash regex

There are no prior, not sure what you are referring to...

malfet · 2026-05-27T14:27:30Z

+LINE_PATTERN = re.compile(r'^    runner_image_tag:[ \t]*"(?P<ver>\d+\.\d+\.\d+)"[ \t]*(#.*)?$')
+
+
+def _emit(decision: str, reason: str, **extra: str) -> None:


Extra is not an str, is it? Also traditionally this is called kwargs

Suggested change

def _emit(decision: str, reason: str, **extra: str) -> None:

def _emit(decision: str, reason: str, **extra: dict) -> None:

malfet · 2026-05-27T14:29:16Z

+    print(f"decision={decision}")
+    print(f"reason={reason}")
+    for k, v in extra.items():
+        print(f"{k}={v}")


Nit

Suggested change

print(f"decision={decision}")

print(f"reason={reason}")

for k, v in extra.items():

print(f"{k}={v}")

print(*(f"{k}={v}" for k, v in {"decision": decision, "reason": reason, **extra}.items()), sep="\n")

malfet · 2026-05-27T14:31:21Z

+    parts = version.split(".")
+    return (int(parts[0]), int(parts[1]), int(parts[2]))


Nit

Suggested change

parts = version.split(".")

return (int(parts[0]), int(parts[1]), int(parts[2]))

return tuple(int(x) for x in version.split("."))

malfet

Asked claude to write succint diff for validating that something is positive semver and this is what it comeup with

#!/usr/bin/env python3
"""Exit 0 iff `git diff` is a single property line with a positive semver bump."""
import re
import subprocess
import sys

SEMVER = re.compile(r"(\d+)\.(\d+)\.(\d+)")
PROP = re.compile(r"^[-+]\s*([\w.-]+)\s*[:=]\s*['\"]?v?(\d+\.\d+\.\d+)['\"]?\s*,?\s*$")


def main() -> int:
    diff = subprocess.check_output(["git", "diff", "--unified=0", "--no-color"], text=True)
    hunks = [l for l in diff.splitlines() if l.startswith(("+", "-")) and not l.startswith(("+++", "---"))]
    if len(hunks) != 2 or hunks[0][0] == hunks[1][0]:
        return 1
    old, new = sorted(hunks)  # '-' < '+'
    om, nm = PROP.match(old), PROP.match(new)
    if not (om and nm) or om.group(1) != nm.group(1):
        return 1
    ov, nv = tuple(map(int, om.group(2).split("."))), tuple(map(int, nm.group(2).split(".")))
    return 0 if nv > ov else 1


if __name__ == "__main__":
    sys.exit(main())

jeanschmidt · 2026-05-28T02:06:30Z

Thanks — taking the nits at lines 37/42/46/51 in the next revision (kwargs rename, dict-merge _emit, comprehension _semver_tuple, stale "prior bash regex" comment deleted).

On replacing the script with the ~25-line version: pushing back, three reasons.

1. Two callers, two diff shapes. osdc-renovate-autoapprove.yml (pre-merge) has no checkout and consumes gh api repos/X/pulls/N/files (array). osdc-auto-update-deploy-prod.yml (post-merge) consumes gh api repos/X/commits/SHA (object). git diff --unified=0 needs a fetch+checkout in both — more YAML, not less.

2. The regex is intentionally over-specified. ^[-+]\s*([\w.-]+)\s*[:=]\s*['\"]?v?(\d+\.\d+\.\d+) matches any key: semver on any file. This gate is defense-in-depth against a compromised RENOVATE_TOKEN: it must refuse anything that isn't literally ^ runner_image_tag:[ \t]*"X.Y.Z"...$ inside osdc/clusters.yaml. A loose regex turns the gate into a rubber stamp.

3. Structured decision= drives the close-comment UX. On reject the workflow posts Auto-closed: close-downgrade — 1.3.0 -> 1.2.99 and Renovate retries on the next eligible version. A boolean exit collapses seven distinct reasons into "validation failed" — harder to triage stale closures.

Why wrong file count is not an error?

Non-zero exit aborts the job before the close-comment step runs. Exit codes are reserved for unrecoverable input errors (missing/malformed PATCH_JSON); deterministic outcomes go through decision= so one YAML handler covers all rejection branches.

The script is ~130 LOC, another 300ish lines are tests to meet the 97% coverage and test things like pin every branch — downgrade, 1.10.0 > 1.9.0 lex-vs-numeric, unquoted version, wrong indent, wrong key, inline-comment tolerance, hunk-header handling, both API shapes. For the security boundary on prod runner rollout, I'd rather over-test.

[ghstack-poisoned]

github-actions · 2026-05-28T02:27:10Z

tofu plan — arc-cbr-production-uw1

✅ Plan succeeded · commit e190a7e0 · run log

Plan output

Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod-uw1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production-uw1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.data.aws_ami.eks_optimized_al2023: Reading...
data.aws_availability_zones.available: Reading...
module.eks.data.aws_caller_identity.current: Reading...
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3]
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role]
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=1fb5d763-c5cd-4de5-bf40-712df992288c]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0121d1038d393182a]
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNFWBLKNFS]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role:pytorch-arc-cbr-production-uw1-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
data.aws_availability_zones.available: Read complete after 1s [id=us-west-1]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-uw1-eks-secrets]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-07fd8394a1d58b614]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-07b06397ce403fa53]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-0b3b22b995e71d8d9]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0bd275a35f8e7ef65]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-08861bee27120b994]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ce35bb011df0cfdb]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0a13e7b49c841e497]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-06d137da3460167c4]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0a8410ffa0f0014a7]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-05f5edbf2c6678c03]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3-20260519191031756900000001]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production-uw1]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-00184fa8d73e575c9]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0f79a2ac72857a304]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-01ec520e3931f5f6a]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0c336634317cc9f35]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01165f36472c0a780]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-06e17b37b87d890f2]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production-uw1:kube-proxy]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production-uw1:vpc-cni]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-066ae5f473a2b07c0]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0cc835aef3e3bcc21]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-02e4c54e5fa3b4f8a]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production-uw1:pytorch-arc-cbr-production-uw1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 1s [id=ab5db6c82031e2d229412c67921160a3b3af073b]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-west-1.amazonaws.com/id/ED52EC64FF5CFAB4151C6E4B5DE279BD]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=3969145930]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production-uw1:coredns]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance]
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change-KarpenterInstanceStateChange]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.subnet_karpenter_discovery["subnet-08861bee27120b994"]: Refreshing state... [id=subnet-08861bee27120b994,karpenter.sh/discovery]
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-058909cc1cdc63fad,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0a13e7b49c841e497"]: Refreshing state... [id=subnet-0a13e7b49c841e497,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller-20260519195229107000000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wants-collector-s3]
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-0da5eaf2022d80aa0]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role]
aws_iam_role.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role]
aws_security_group.efs: Refreshing state... [id=sg-01c1f3fa51705db76]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role-20260519200350781900000004]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role-20260519200350777100000003]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role-20260519200350826400000005]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-efs-csi-driver]
aws_efs_mount_target.pypi_cache["subnet-0a13e7b49c841e497"]: Refreshing state... [id=fsmt-089fd42858a5a85ab]
aws_efs_mount_target.pypi_cache["subnet-08861bee27120b994"]: Refreshing state... [id=fsmt-00708cc923d4d2055]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

github-actions · 2026-05-28T02:31:03Z

tofu plan — arc-cbr-production

✅ Plan succeeded · commit e190a7e0 · run log

Plan output

Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


data.aws_availability_zones.available: Reading...
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3]
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-node-role]
module.eks.data.aws_caller_identity.current: Reading...
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0e712dc7e743bbcf7]
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=527854a4-e335-4f95-bc89-1321cff7a478]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-2]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNOLQFN6MU]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-eks-secrets]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-node-role:pytorch-arc-cbr-production-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-009f1fe7d56695348]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-032d4401e63f0c9b9]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-05e96ee7cb818e5c0]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ab11fcdb8d4ea113]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0fddf2f74e7e978c7]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0992f582e9bf2836e]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0709abbcafa23aec0]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-0a583bbbcac436ebd]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-01187bfaa68514400]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-0577a02acde719bff]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0d26e280575e8aaf4]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-0d34063a19f4b07b4]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-01e479dcb5aedf696]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-07d5cd4c479c827ab]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-084975a7f7af2696e]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-0ce4fba002d90e7d5]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-08e264cbbd47be1ee]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0ad75b2f5282877db]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-0f7b8f4473e5790df]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01d38d41a7ca82a08]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-0c7ecd4166a01e5f0]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-0cb3785c433ed7718]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-097abe4676c74f71b]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-0b6e08b4b0dc968c0]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0beb143017359bda1]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production:vpc-cni]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production:kube-proxy]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-0b820cd15307b6d57]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production:pytorch-arc-cbr-production-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=033a163afb2babc26f7883e642621ac361c93d61]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-2.amazonaws.com/id/0A621339248958D6D5F2FF084BD185B5]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=2879363015]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-01ec5f742ae028981,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0577a02acde719bff"]: Refreshing state... [id=subnet-0577a02acde719bff,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0709abbcafa23aec0"]: Refreshing state... [id=subnet-0709abbcafa23aec0,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0992f582e9bf2836e"]: Refreshing state... [id=subnet-0992f582e9bf2836e,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-karpenter-controller]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller-20260518021844404100000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (arc-cbr-production) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-pypi-wants-collector-s3]
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-pypi-wheel-syncer-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-0deb818bbf18764de]
data.terraform_remote_state.base: Read complete after 0s
aws_security_group.efs: Refreshing state... [id=sg-0979eb5e3d9d3db9f]
aws_iam_role.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wants-collector-role]
aws_iam_role.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wheel-syncer-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-efs-csi-driver-role]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wheel-syncer-role-20260518023249929400000004]
aws_efs_mount_target.pypi_cache["subnet-0992f582e9bf2836e"]: Refreshing state... [id=fsmt-03523586bb4ff0c46]
aws_efs_mount_target.pypi_cache["subnet-0577a02acde719bff"]: Refreshing state... [id=fsmt-07d7b111b9cd6684e]
aws_efs_mount_target.pypi_cache["subnet-0709abbcafa23aec0"]: Refreshing state... [id=fsmt-08cd5108febbacef9]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wants-collector-role-20260518023249903900000003]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-efs-csi-driver-role-20260518023249955700000005]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-efs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

Update

c738be2

[ghstack-poisoned]

jeanschmidt requested a review from huydhn as a code owner May 27, 2026 01:59

jeanschmidt had a problem deploying to osdc-staging May 27, 2026 01:59 — with GitHub Actions Error

malfet suggested changes May 27, 2026

View reviewed changes

malfet reviewed May 27, 2026

View reviewed changes

Update

ec1e7a5

[ghstack-poisoned]

jeanschmidt had a problem deploying to osdc-staging May 28, 2026 02:26 — with GitHub Actions Error

jeanschmidt temporarily deployed to osdc-staging May 28, 2026 02:26 — with GitHub Actions Inactive

jeanschmidt temporarily deployed to osdc-staging May 28, 2026 02:29 — with GitHub Actions Inactive

jeanschmidt temporarily deployed to osdc-staging May 28, 2026 02:31 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts: add validate-runner-bump.py for Renovate diff validation#636

scripts: add validate-runner-bump.py for Renovate diff validation#636
jeanschmidt wants to merge 2 commits into
gh/jeanschmidt/33/basefrom
gh/jeanschmidt/33/head

jeanschmidt commented May 27, 2026 •

edited

Loading

Uh oh!

malfet left a comment

Uh oh!

malfet May 27, 2026

Uh oh!

malfet May 27, 2026

Uh oh!

malfet May 27, 2026

Uh oh!

malfet May 27, 2026

Uh oh!

malfet May 27, 2026

Uh oh!

malfet left a comment

Uh oh!

jeanschmidt commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		EXPECTED_FILE = "osdc/clusters.yaml"

		# MUST remain bit-for-bit equivalent to the prior bash regex so the pre-

		LINE_PATTERN = re.compile(r'^ runner_image_tag:[ \t]"(?P<ver>\d+\.\d+\.\d+)"[ \t](#.*)?$')


		def _emit(decision: str, reason: str, **extra: str) -> None:

		parts = version.split(".")
		return (int(parts[0]), int(parts[1]), int(parts[2]))

	parts = version.split(".")
	return (int(parts[0]), int(parts[1]), int(parts[2]))
	return tuple(int(x) for x in version.split("."))

Conversation

jeanschmidt commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Changes

Notes

Testing

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

malfet May 27, 2026

Choose a reason for hiding this comment

Uh oh!

malfet May 27, 2026

Choose a reason for hiding this comment

Uh oh!

malfet May 27, 2026

Choose a reason for hiding this comment

Uh oh!

malfet May 27, 2026

Choose a reason for hiding this comment

Uh oh!

malfet May 27, 2026

Choose a reason for hiding this comment

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

jeanschmidt commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tofu plan — arc-cbr-production-uw1

Uh oh!

github-actions Bot commented May 28, 2026

tofu plan — arc-cbr-production

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeanschmidt commented May 27, 2026 •

edited

Loading

jeanschmidt commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading