buildkit: deploy on meta-prod-aws-ue1 (us-east-1) by huydhn · Pull Request #728 · pytorch/ci-infra

huydhn · 2026-06-10T23:15:09Z

Stack from ghstack (oldest at bottom):

Same buildkit config as arc-cbr-production (ue2): m6id.24xlarge amd64 (2/node)

m7gd.16xlarge arm64 (4/node), autoscaling min 2/4, max 360/30, fallback 32/8.
Adds keda + buildkit to the cluster's module list (keda provides the CRDs the
autoscaling needs).

[ghstack-poisoned]

github-actions · 2026-06-10T23:16:28Z

tofu plan — meta-prod-aws-ue1

✅ Plan succeeded · commit 2cb58b74 · run log

Plan output

Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod-ue1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (meta-prod-aws-ue1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


data.aws_availability_zones.available: Reading...
module.eks.data.aws_caller_identity.current: Reading...
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=meta-prod-aws-ue1-harbor-s3]
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=9274017b-776a-41bd-9f11-d118a1174159]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-046818728dce02486]
module.eks.aws_iam_role.cluster: Refreshing state... [id=meta-prod-aws-ue1-cluster-role]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.eks.aws_iam_role.node: Refreshing state... [id=meta-prod-aws-ue1-node-role]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNGRUDTXPT]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/meta-prod-aws-ue1-eks-secrets]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=meta-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=meta-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=meta-prod-aws-ue1-node-role:meta-prod-aws-ue1-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=meta-prod-aws-ue1-harbor-s3/arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-0dafeb02304897431]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-0ce44cb6446f3c1b6]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-0cf3d9cf37ee998b6]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0beb5fc44f0ee165f]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0348c5058db524cd2]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0eafd792589fbb363]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-02ce11d6646870431]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-033772b4490df1b41]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-00c2e2605c4dea199]
module.vpc.aws_eip.nat_secondary["us-east-1b-0"]: Refreshing state... [id=eipalloc-0bcfe1f98793e1b12]
module.vpc.aws_eip.nat_secondary["us-east-1c-3"]: Refreshing state... [id=eipalloc-0af54aa2e5f40dfa4]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0d65ec2dd49f0d87c]
module.vpc.aws_eip.nat_secondary["us-east-1b-2"]: Refreshing state... [id=eipalloc-0f0b720f4cca62ec7]
module.vpc.aws_eip.nat_secondary["us-east-1b-3"]: Refreshing state... [id=eipalloc-0c8291ee817240e1f]
module.vpc.aws_eip.nat_secondary["us-east-1c-5"]: Refreshing state... [id=eipalloc-04fe645562f597aaa]
module.vpc.aws_eip.nat_secondary["us-east-1b-5"]: Refreshing state... [id=eipalloc-0d078dc6f07628714]
module.vpc.aws_eip.nat_secondary["us-east-1c-1"]: Refreshing state... [id=eipalloc-0cb5208c5f775baf6]
module.vpc.aws_eip.nat_secondary["us-east-1a-2"]: Refreshing state... [id=eipalloc-080ec4e265ebdc5ad]
module.vpc.aws_eip.nat_secondary["us-east-1c-0"]: Refreshing state... [id=eipalloc-05844040c7248f44f]
module.vpc.aws_eip.nat_secondary["us-east-1a-5"]: Refreshing state... [id=eipalloc-01f89a7c130d2a810]
module.vpc.aws_eip.nat_secondary["us-east-1a-1"]: Refreshing state... [id=eipalloc-08c7bd3306cf687ca]
module.vpc.aws_eip.nat_secondary["us-east-1c-6"]: Refreshing state... [id=eipalloc-0d22d3aa0667a1070]
module.vpc.aws_eip.nat_secondary["us-east-1b-6"]: Refreshing state... [id=eipalloc-0f922f499d32f1368]
module.vpc.aws_eip.nat_secondary["us-east-1a-6"]: Refreshing state... [id=eipalloc-02e84a51a14c9cbda]
module.vpc.aws_eip.nat_secondary["us-east-1c-2"]: Refreshing state... [id=eipalloc-025ef0e1813277c67]
module.vpc.aws_eip.nat_secondary["us-east-1a-3"]: Refreshing state... [id=eipalloc-0bda13d7b70c00c00]
module.vpc.aws_eip.nat_secondary["us-east-1c-4"]: Refreshing state... [id=eipalloc-00c5df9f3b60f353d]
module.vpc.aws_eip.nat_secondary["us-east-1b-4"]: Refreshing state... [id=eipalloc-0aba12aa23c11d20c]
module.vpc.aws_eip.nat_secondary["us-east-1b-1"]: Refreshing state... [id=eipalloc-0d095305019486ae6]
module.vpc.aws_eip.nat_secondary["us-east-1a-0"]: Refreshing state... [id=eipalloc-0c8a6faed0a97479d]
module.vpc.aws_eip.nat_secondary["us-east-1a-4"]: Refreshing state... [id=eipalloc-09fa171393c3a7cfb]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-07bfd0f170c3b3406]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0f922406e02ecba1d]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-078f44b58c8b48ade]
module.eks.aws_eks_cluster.this: Refreshing state... [id=meta-prod-aws-ue1]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=meta-prod-aws-ue1:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=meta-prod-aws-ue1:vpc-cni]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_launch_template.base: Refreshing state... [id=lt-043779597e3b5a7fd]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=meta-prod-aws-ue1:kube-proxy]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-05e7e66e960593972]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-05da47c4ed26ae390]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0616491b7baeab47f]
module.eks.aws_eks_node_group.base: Refreshing state... [id=meta-prod-aws-ue1:meta-prod-aws-ue1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=b1b539daa206035ae3c3e28288b0681fa1b462f3]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/6C84A48E1BF23A027C1E78912A368743]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=3022997555]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=meta-prod-aws-ue1-ebs-csi-driver-role]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-025de56c0aac8d3f0]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0cff785d8001fc914]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-09414719983019b49]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-05d5b7a41aa6323ed]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-0c665948be8d0282e]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-09287d705ce4a88bc]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=meta-prod-aws-ue1:coredns]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry/arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=meta-prod-aws-ue1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=meta-prod-aws-ue1#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1:aws-ebs-csi-driver]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-09dca398d838d4247]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-0306281246323bd27]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-02a8683fa7258f295]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (meta-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-instance-state-change]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=meta-prod-aws-ue1-karpenter-rebalance]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=meta-prod-aws-ue1-karpenter-spot-interruption]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-scheduled-change]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/308535385114/meta-prod-aws-ue1-karpenter]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/308535385114/meta-prod-aws-ue1-karpenter]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=meta-prod-aws-ue1-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=meta-prod-aws-ue1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-instance-state-change-KarpenterInstanceStateChange]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-016f4a0d209f3e4a9,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0d65ec2dd49f0d87c"]: Refreshing state... [id=subnet-0d65ec2dd49f0d87c,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-02ce11d6646870431"]: Refreshing state... [id=subnet-02ce11d6646870431,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0348c5058db524cd2"]: Refreshing state... [id=subnet-0348c5058db524cd2,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=meta-prod-aws-ue1-karpenter-controller]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=meta-prod-aws-ue1-karpenter-controller-20260528200455768400000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (meta-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-pypi-wants-collector-s3]
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-pypi-wheel-syncer-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-023e57b36ec1cd426]
data.terraform_remote_state.base: Read complete after 2s
aws_iam_role.wheel_syncer: Refreshing state... [id=meta-prod-aws-ue1-pypi-wheel-syncer-role]
aws_iam_role.wants_collector: Refreshing state... [id=meta-prod-aws-ue1-pypi-wants-collector-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1-efs-csi-driver-role]
aws_security_group.efs: Refreshing state... [id=sg-0bc06caa62214c9b7]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=meta-prod-aws-ue1-pypi-wheel-syncer-role-20260528201106257700000005]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1-efs-csi-driver-role-20260528201106116400000003]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=meta-prod-aws-ue1-pypi-wants-collector-role-20260528201106192600000004]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1:aws-efs-csi-driver]
aws_efs_mount_target.pypi_cache["subnet-0d65ec2dd49f0d87c"]: Refreshing state... [id=fsmt-0ffaedc58eceb7749]
aws_efs_mount_target.pypi_cache["subnet-0348c5058db524cd2"]: Refreshing state... [id=fsmt-0500c573cafe66133]
aws_efs_mount_target.pypi_cache["subnet-02ce11d6646870431"]: Refreshing state... [id=fsmt-06a05c001541338d2]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

[ghstack-poisoned]

Same buildkit config as arc-cbr-production (ue2): m6id.24xlarge amd64 (2/node) + m7gd.16xlarge arm64 (4/node), autoscaling min 2/4, max 360/30, fallback 32/8. Adds keda + buildkit to the cluster's module list (keda provides the CRDs the autoscaling needs). ghstack-source-id: f6bf677 Pull-Request: #728

github-actions · 2026-06-10T23:24:34Z

tofu plan — arc-cbr-production

✅ Plan succeeded · commit 2cb58b74 · run log

Plan output

Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0e712dc7e743bbcf7]
module.eks.data.aws_caller_identity.current: Reading...
data.aws_availability_zones.available: Reading...
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-node-role]
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3]
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=527854a4-e335-4f95-bc89-1321cff7a478]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
data.aws_availability_zones.available: Read complete after 1s [id=us-east-2]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNOLQFN6MU]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-eks-secrets]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-node-role:pytorch-arc-cbr-production-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-009f1fe7d56695348]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-032d4401e63f0c9b9]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-05e96ee7cb818e5c0]
module.vpc.aws_eip.nat_secondary["us-east-2a-6"]: Refreshing state... [id=eipalloc-0113c95dbdec2f879]
module.vpc.aws_eip.nat_secondary["us-east-2b-6"]: Refreshing state... [id=eipalloc-06b7b88826199a232]
module.vpc.aws_eip.nat_secondary["us-east-2b-4"]: Refreshing state... [id=eipalloc-0de33181548ac2e5a]
module.vpc.aws_eip.nat_secondary["us-east-2b-3"]: Refreshing state... [id=eipalloc-021ee6c9f1d20b71a]
module.vpc.aws_eip.nat_secondary["us-east-2b-2"]: Refreshing state... [id=eipalloc-063bee447616351f9]
module.vpc.aws_eip.nat_secondary["us-east-2c-0"]: Refreshing state... [id=eipalloc-03542e74755fc105b]
module.vpc.aws_eip.nat_secondary["us-east-2a-2"]: Refreshing state... [id=eipalloc-09b15a770e0c6d552]
module.vpc.aws_eip.nat_secondary["us-east-2a-1"]: Refreshing state... [id=eipalloc-0f2b00a9ac31df215]
module.vpc.aws_eip.nat_secondary["us-east-2c-6"]: Refreshing state... [id=eipalloc-0aede78edc69cf695]
module.vpc.aws_eip.nat_secondary["us-east-2a-3"]: Refreshing state... [id=eipalloc-034d5e1f5a2fcb795]
module.vpc.aws_eip.nat_secondary["us-east-2c-4"]: Refreshing state... [id=eipalloc-0cc3dadec18bbb3f3]
module.vpc.aws_eip.nat_secondary["us-east-2a-5"]: Refreshing state... [id=eipalloc-0bd9bf54bd6010323]
module.vpc.aws_eip.nat_secondary["us-east-2a-0"]: Refreshing state... [id=eipalloc-086a011b3c26c0dd7]
module.vpc.aws_eip.nat_secondary["us-east-2c-5"]: Refreshing state... [id=eipalloc-02825435a2786b3d8]
module.vpc.aws_eip.nat_secondary["us-east-2b-1"]: Refreshing state... [id=eipalloc-0e67c0a8cd8c990da]
module.vpc.aws_eip.nat_secondary["us-east-2b-0"]: Refreshing state... [id=eipalloc-0cead990d60ce181e]
module.vpc.aws_eip.nat_secondary["us-east-2c-2"]: Refreshing state... [id=eipalloc-07cfdb2fd5dc07459]
module.vpc.aws_eip.nat_secondary["us-east-2c-3"]: Refreshing state... [id=eipalloc-0d3a71569b2f687be]
module.vpc.aws_eip.nat_secondary["us-east-2c-1"]: Refreshing state... [id=eipalloc-06a980076e99cda81]
module.vpc.aws_eip.nat_secondary["us-east-2a-4"]: Refreshing state... [id=eipalloc-067d535102a61d1a8]
module.vpc.aws_eip.nat_secondary["us-east-2b-5"]: Refreshing state... [id=eipalloc-0cde9a6463901f1e1]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0d26e280575e8aaf4]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-0d34063a19f4b07b4]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ab11fcdb8d4ea113]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-0577a02acde719bff]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0992f582e9bf2836e]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0709abbcafa23aec0]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-01e479dcb5aedf696]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-0a583bbbcac436ebd]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-01187bfaa68514400]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0fddf2f74e7e978c7]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-084975a7f7af2696e]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-0ce4fba002d90e7d5]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-07d5cd4c479c827ab]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0ad75b2f5282877db]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-08e264cbbd47be1ee]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-0f7b8f4473e5790df]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-0c7ecd4166a01e5f0]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01d38d41a7ca82a08]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-0cb3785c433ed7718]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0beb143017359bda1]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-0b6e08b4b0dc968c0]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-097abe4676c74f71b]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production:kube-proxy]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production:vpc-cni]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-0b820cd15307b6d57]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production:pytorch-arc-cbr-production-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=033a163afb2babc26f7883e642621ac361c93d61]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-2.amazonaws.com/id/0A621339248958D6D5F2FF084BD185B5]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=2879363015]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production:coredns]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption]
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-01ec5f742ae028981,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0992f582e9bf2836e"]: Refreshing state... [id=subnet-0992f582e9bf2836e,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0577a02acde719bff"]: Refreshing state... [id=subnet-0577a02acde719bff,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0709abbcafa23aec0"]: Refreshing state... [id=subnet-0709abbcafa23aec0,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-karpenter-controller]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller-20260518021844404100000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (arc-cbr-production) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-pypi-wheel-syncer-s3]
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-pypi-wants-collector-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-0deb818bbf18764de]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wheel-syncer-role]
aws_iam_role.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wants-collector-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-efs-csi-driver-role]
aws_security_group.efs: Refreshing state... [id=sg-0979eb5e3d9d3db9f]
aws_efs_mount_target.pypi_cache["subnet-0992f582e9bf2836e"]: Refreshing state... [id=fsmt-03523586bb4ff0c46]
aws_efs_mount_target.pypi_cache["subnet-0709abbcafa23aec0"]: Refreshing state... [id=fsmt-08cd5108febbacef9]
aws_efs_mount_target.pypi_cache["subnet-0577a02acde719bff"]: Refreshing state... [id=fsmt-07d7b111b9cd6684e]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wheel-syncer-role-20260518023249929400000004]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wants-collector-role-20260518023249903900000003]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-efs-csi-driver-role-20260518023249955700000005]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-efs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

huydhn · 2026-06-11T00:25:41Z

I suspect that there might be more to this than meets the eye here, not the autoscaling part, but the Buildkit in multiple regions part. @jeanschmidt @zxiiro If you have successfully deployed Buildkit in multiple regions, let me know. I'm referring to these lines specifically https://github.com/pytorch/ci-infra/blob/main/osdc/modules/arc-runners/defs/rel-l-x86iavx512-8-64.yaml#L11-L12

[ghstack-poisoned]

Same buildkit config as arc-cbr-production (ue2): m6id.24xlarge amd64 (2/node) + m7gd.16xlarge arm64 (4/node), autoscaling min 2/4, max 360/30, fallback 32/8. Adds keda + buildkit to the cluster's module list (keda provides the CRDs the autoscaling needs). ghstack-source-id: 0170c40 Pull-Request: #728

github-actions · 2026-06-11T01:00:00Z

tofu plan — arc-cbr-production-uw1

✅ Plan succeeded · commit 2cb58b74 · run log

Plan output

Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod-uw1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production-uw1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0121d1038d393182a]
module.eks.data.aws_caller_identity.current: Reading...
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=1fb5d763-c5cd-4de5-bf40-712df992288c]
data.aws_availability_zones.available: Reading...
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3]
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNFWBLKNFS]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role:pytorch-arc-cbr-production-uw1-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
data.aws_availability_zones.available: Read complete after 0s [id=us-west-1]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-uw1-eks-secrets]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-07fd8394a1d58b614]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-0b3b22b995e71d8d9]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-07b06397ce403fa53]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-05f5edbf2c6678c03]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0bd275a35f8e7ef65]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0a8410ffa0f0014a7]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-06d137da3460167c4]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ce35bb011df0cfdb]
module.vpc.aws_eip.nat_secondary["us-west-1c-3"]: Refreshing state... [id=eipalloc-09f89978685e7f3c7]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-08861bee27120b994]
module.vpc.aws_eip.nat_secondary["us-west-1a-5"]: Refreshing state... [id=eipalloc-059986f686b188dc2]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0a13e7b49c841e497]
module.vpc.aws_eip.nat_secondary["us-west-1a-2"]: Refreshing state... [id=eipalloc-0647e169131be5893]
module.vpc.aws_eip.nat_secondary["us-west-1c-5"]: Refreshing state... [id=eipalloc-0635efedc10ee5f66]
module.vpc.aws_eip.nat_secondary["us-west-1c-0"]: Refreshing state... [id=eipalloc-0d565f5bf077b05cf]
module.vpc.aws_eip.nat_secondary["us-west-1c-1"]: Refreshing state... [id=eipalloc-0bd09c7f2dcaa0a46]
module.vpc.aws_eip.nat_secondary["us-west-1a-4"]: Refreshing state... [id=eipalloc-0dfae88698dce850e]
module.vpc.aws_eip.nat_secondary["us-west-1a-1"]: Refreshing state... [id=eipalloc-012ac413772344fea]
module.vpc.aws_eip.nat_secondary["us-west-1a-3"]: Refreshing state... [id=eipalloc-05a2bad636af56f4d]
module.vpc.aws_eip.nat_secondary["us-west-1c-4"]: Refreshing state... [id=eipalloc-0dfaa16c61333ceb3]
module.vpc.aws_eip.nat_secondary["us-west-1a-6"]: Refreshing state... [id=eipalloc-08763a35db0a26caa]
module.vpc.aws_eip.nat_secondary["us-west-1a-0"]: Refreshing state... [id=eipalloc-0e3ca79e34012a238]
module.vpc.aws_eip.nat_secondary["us-west-1c-2"]: Refreshing state... [id=eipalloc-0f2e15b6a36b52fac]
module.vpc.aws_eip.nat_secondary["us-west-1c-6"]: Refreshing state... [id=eipalloc-0cf91a032d10f4ec5]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-00184fa8d73e575c9]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0f79a2ac72857a304]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production-uw1]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3-20260519191031756900000001]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0c336634317cc9f35]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-01ec520e3931f5f6a]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-06e17b37b87d890f2]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01165f36472c0a780]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0cc835aef3e3bcc21]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-02e4c54e5fa3b4f8a]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production-uw1:kube-proxy]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production-uw1:vpc-cni]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-066ae5f473a2b07c0]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production-uw1:pytorch-arc-cbr-production-uw1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=ab5db6c82031e2d229412c67921160a3b3af073b]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-west-1.amazonaws.com/id/ED52EC64FF5CFAB4151C6E4B5DE279BD]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=3969145930]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production-uw1:coredns]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance-KarpenterRebalance]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change-KarpenterInstanceStateChange]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.subnet_karpenter_discovery["subnet-0a13e7b49c841e497"]: Refreshing state... [id=subnet-0a13e7b49c841e497,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-08861bee27120b994"]: Refreshing state... [id=subnet-08861bee27120b994,karpenter.sh/discovery]
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-058909cc1cdc63fad,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller-20260519195229107000000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-s3]
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wants-collector-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-0da5eaf2022d80aa0]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role]
aws_iam_role.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role]
aws_security_group.efs: Refreshing state... [id=sg-01c1f3fa51705db76]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role-20260519200350781900000004]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role-20260519200350826400000005]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role-20260519200350777100000003]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-efs-csi-driver]
aws_efs_mount_target.pypi_cache["subnet-0a13e7b49c841e497"]: Refreshing state... [id=fsmt-089fd42858a5a85ab]
aws_efs_mount_target.pypi_cache["subnet-08861bee27120b994"]: Refreshing state... [id=fsmt-00708cc923d4d2055]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

[ghstack-poisoned]

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * pytorch#727 * pytorch#728 * pytorch#726 * pytorch#725 * __->__ pytorch#724 Same min per arch as staging (amd64 2 / arm64 4). Max sized from 14-day docker-build concurrency: amd64 128 (peak 105 + headroom), arm64 16 (peak 8, likely capped by the old fixed pool).

[ghstack-poisoned]

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * pytorch#727 * pytorch#728 * pytorch#726 * __->__ pytorch#725 Burst 8 parallel buildctl builds per arch (each holds a maxconn=1 slot ~10m via sleep). With amd64_min=2 they serialize ~43m > timeout 30m and fail unless KEDA scales the pool up; one wave ~18m when it does.

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * pytorch#727 * pytorch#728 * __->__ pytorch#726 * pytorch#725 Enable the KEDA operator's Prometheus endpoint (prometheus.operator.enabled) and add a ServiceMonitor scraping keda_* (scaler/scaledobject values, errors, activity, latency) at 60s. Lets us see what KEDA reads and when it errors / falls back.

huydhn · 2026-06-18T02:08:34Z

This is ready to land now given the multiple Buildkit clusters test on staging #667, but I think I will hold on to this PR a bit longer until after the next prod deployment

[ghstack-poisoned]

github-actions · 2026-06-18T21:03:33Z

tofu plan — lf-prod-aws-ue1

✅ Plan succeeded · commit 2cb58b74 · run log

Plan output

Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::lf-osdc-tfstate-prod-ue1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (lf-prod-aws-ue1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.data.aws_caller_identity.current: Reading...
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.aws_iam_role.cluster: Refreshing state... [id=lf-prod-aws-ue1-cluster-role]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-06f350eae88f37700]
data.aws_availability_zones.available: Reading...
module.eks.aws_iam_role.node: Refreshing state... [id=lf-prod-aws-ue1-node-role]
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=lf-prod-aws-ue1-harbor-s3]
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=e5e45db6-94ad-4dfd-8a1a-213730256a9c]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=lf-prod-aws-ue1-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=391835788720]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAVWOZ3UWYJZNKMI7G]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/lf-prod-aws-ue1-eks-secrets]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=lf-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=lf-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=lf-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=lf-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=lf-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=lf-prod-aws-ue1-node-role:lf-prod-aws-ue1-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=lf-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-0dafeb02304897431]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-03548aa6de237de4c]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-089c5123e6da8d43c]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=lf-prod-aws-ue1-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=lf-prod-aws-ue1-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue1-harbor-registry]
module.vpc.aws_eip.nat_secondary["us-east-1b-4"]: Refreshing state... [id=eipalloc-0bd28bd3991297f4a]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-051217c40b1d02b3a]
module.vpc.aws_eip.nat_secondary["us-east-1a-3"]: Refreshing state... [id=eipalloc-01a2c9bf10e45099b]
module.vpc.aws_eip.nat_secondary["us-east-1a-2"]: Refreshing state... [id=eipalloc-0625b097098b1ac2a]
module.vpc.aws_eip.nat_secondary["us-east-1b-2"]: Refreshing state... [id=eipalloc-0c66936151cceca74]
module.vpc.aws_eip.nat_secondary["us-east-1c-4"]: Refreshing state... [id=eipalloc-0db3b4c44cbf47d5a]
module.vpc.aws_eip.nat_secondary["us-east-1c-6"]: Refreshing state... [id=eipalloc-088384cdd02d04bce]
module.vpc.aws_eip.nat_secondary["us-east-1c-2"]: Refreshing state... [id=eipalloc-0a4874208e55dfb7b]
module.vpc.aws_eip.nat_secondary["us-east-1a-5"]: Refreshing state... [id=eipalloc-0936080d9155a0306]
module.vpc.aws_eip.nat_secondary["us-east-1c-0"]: Refreshing state... [id=eipalloc-051a8792e8dad6c5b]
module.vpc.aws_eip.nat_secondary["us-east-1c-3"]: Refreshing state... [id=eipalloc-03ec1105bba33668b]
module.vpc.aws_eip.nat_secondary["us-east-1a-1"]: Refreshing state... [id=eipalloc-034348675ffacd849]
module.vpc.aws_eip.nat_secondary["us-east-1b-5"]: Refreshing state... [id=eipalloc-00cd91c376a1f197d]
module.vpc.aws_eip.nat_secondary["us-east-1c-1"]: Refreshing state... [id=eipalloc-01b2275a4f494fe58]
module.vpc.aws_eip.nat_secondary["us-east-1b-6"]: Refreshing state... [id=eipalloc-085e43aacdd3b5c5f]
module.vpc.aws_eip.nat_secondary["us-east-1a-0"]: Refreshing state... [id=eipalloc-03131d115b478f7c3]
module.vpc.aws_eip.nat_secondary["us-east-1a-6"]: Refreshing state... [id=eipalloc-046fa83874bde66b5]
module.vpc.aws_eip.nat_secondary["us-east-1c-5"]: Refreshing state... [id=eipalloc-07612f4e715e508ae]
module.vpc.aws_eip.nat_secondary["us-east-1b-0"]: Refreshing state... [id=eipalloc-0533e7098b8548fda]
module.vpc.aws_eip.nat_secondary["us-east-1a-4"]: Refreshing state... [id=eipalloc-0581e7f2f8194266e]
module.vpc.aws_eip.nat_secondary["us-east-1b-1"]: Refreshing state... [id=eipalloc-0f13b0cd68a133531]
module.vpc.aws_eip.nat_secondary["us-east-1b-3"]: Refreshing state... [id=eipalloc-042e254711d0d3dda]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-087f338c446cffe5d]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-0f16b1a5ecd405405]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-0fe9cf01a4661b360]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0afe958a38da9f46c]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-06e680510bc45584b]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-0f7184dc74425b3ca]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-01ca3df6137b445c0]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-07234379e7833a398]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0fa332056910f46b2]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=lf-prod-aws-ue1-harbor-s3/arn:aws:iam::391835788720:policy/lf-prod-aws-ue1-harbor-registry]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0a6049a78a7428383]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-025956fd021d43094]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-025d5eab2da94f8e6]
module.eks.aws_eks_cluster.this: Refreshing state... [id=lf-prod-aws-ue1]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-0cf871626838e4133]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-09c6f66e50dce1835]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-0d0e9d964d1cd8a9e]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-027811d9ba750a284]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-0264db606b7f24bb6]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-02536bbe724eaaa2f]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=lf-prod-aws-ue1:vpc-cni]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=lf-prod-aws-ue1:kube-proxy]
module.eks.aws_eks_access_entry.cluster_admin["lf_osdc_admin"]: Refreshing state... [id=lf-prod-aws-ue1:arn:aws:iam::391835788720:role/lf_osdc_admin]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-0e3be05a985acc61c]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-00cd583d71292870b]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-009170e5cc902aa3e]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-0c6e77db648d5279c]
module.eks.aws_eks_node_group.base: Refreshing state... [id=lf-prod-aws-ue1:lf-prod-aws-ue1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=b1b539daa206035ae3c3e28288b0681fa1b462f3]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::391835788720:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/E8EF4A6C55DB9699E53A54DA444C21A3]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=lf-prod-aws-ue1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=717515682]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=lf-prod-aws-ue1-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["lf_osdc_admin"]: Refreshing state... [id=lf-prod-aws-ue1#arn:aws:iam::391835788720:role/lf_osdc_admin#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=lf-prod-aws-ue1:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=lf-prod-aws-ue1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=lf-prod-aws-ue1-harbor-registry/arn:aws:iam::391835788720:policy/lf-prod-aws-ue1-harbor-registry]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=lf-prod-aws-ue1:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (lf-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=lf-prod-aws-ue1-karpenter-instance-state-change]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=lf-prod-aws-ue1-karpenter-rebalance]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=lf-prod-aws-ue1-karpenter-spot-interruption]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=lf-prod-aws-ue1-karpenter-scheduled-change]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/391835788720/lf-prod-aws-ue1-karpenter]
data.terraform_remote_state.base: Read complete after 0s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-07a769a5e8a93a444,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-07234379e7833a398"]: Refreshing state... [id=subnet-07234379e7833a398,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-01ca3df6137b445c0"]: Refreshing state... [id=subnet-01ca3df6137b445c0,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0fa332056910f46b2"]: Refreshing state... [id=subnet-0fa332056910f46b2,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=lf-prod-aws-ue1-karpenter-controller]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/391835788720/lf-prod-aws-ue1-karpenter]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue1-karpenter-controller]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=lf-prod-aws-ue1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=lf-prod-aws-ue1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=lf-prod-aws-ue1-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=lf-prod-aws-ue1-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=lf-prod-aws-ue1-karpenter-controller-20260605165913470400000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

github-actions · 2026-06-18T21:04:20Z

tofu plan — lf-prod-aws-ue2

✅ Plan succeeded · commit 2cb58b74 · run log

Plan output

Installed 1 package in 1ms
{
    "BucketArn": "arn:aws:s3:::lf-osdc-tfstate-prod-ue2",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (lf-prod-aws-ue2) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.data.aws_ami.eks_optimized_al2023: Reading...
data.aws_availability_zones.available: Reading...
module.eks.aws_iam_role.node: Refreshing state... [id=lf-prod-aws-ue2-node-role]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0f7d54e3accfbe3e4]
module.eks.aws_iam_role.cluster: Refreshing state... [id=lf-prod-aws-ue2-cluster-role]
module.eks.data.aws_caller_identity.current: Reading...
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=lf-prod-aws-ue2-harbor-registry]
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=lf-prod-aws-ue2-harbor-s3]
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=27a9b8e9-2509-43ce-ac8e-cfc320b65fe2]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=391835788720]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAVWOZ3UWYMGG4LIHB]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-2]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=lf-prod-aws-ue2-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=lf-prod-aws-ue2-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=lf-prod-aws-ue2-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=lf-prod-aws-ue2-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=lf-prod-aws-ue2-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=lf-prod-aws-ue2-node-role:lf-prod-aws-ue2-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=lf-prod-aws-ue2-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/lf-prod-aws-ue2-eks-secrets]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 0s [id=ami-009f1fe7d56695348]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue2-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=lf-prod-aws-ue2-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=lf-prod-aws-ue2-harbor-registry]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-061f8f7ac8b40d720]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-042c4d31ed557eaa4]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=lf-prod-aws-ue2-harbor-s3/arn:aws:iam::391835788720:policy/lf-prod-aws-ue2-harbor-registry]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0508ab6e3db7ccf08]
module.vpc.aws_eip.nat_secondary["us-east-2c-1"]: Refreshing state... [id=eipalloc-0241d507f34cdb0b5]
module.vpc.aws_eip.nat_secondary["us-east-2a-5"]: Refreshing state... [id=eipalloc-095865342b4c692ac]
module.vpc.aws_eip.nat_secondary["us-east-2a-3"]: Refreshing state... [id=eipalloc-0737f1fdf35a0f975]
module.vpc.aws_eip.nat_secondary["us-east-2b-5"]: Refreshing state... [id=eipalloc-06f0755f7542d77fa]
module.vpc.aws_eip.nat_secondary["us-east-2c-2"]: Refreshing state... [id=eipalloc-08e66df79eddc18b5]
module.vpc.aws_eip.nat_secondary["us-east-2a-6"]: Refreshing state... [id=eipalloc-077cbc910a56d08fd]
module.vpc.aws_eip.nat_secondary["us-east-2c-6"]: Refreshing state... [id=eipalloc-057df768d859ed17e]
module.vpc.aws_eip.nat_secondary["us-east-2c-4"]: Refreshing state... [id=eipalloc-005e21ac878c4db34]
module.vpc.aws_eip.nat_secondary["us-east-2b-1"]: Refreshing state... [id=eipalloc-0c8d74e3dcfb2dad0]
module.vpc.aws_eip.nat_secondary["us-east-2c-5"]: Refreshing state... [id=eipalloc-06c020042f283554a]
module.vpc.aws_eip.nat_secondary["us-east-2a-0"]: Refreshing state... [id=eipalloc-0a9078e90b80cc1de]
module.vpc.aws_eip.nat_secondary["us-east-2b-6"]: Refreshing state... [id=eipalloc-09c38605941dbbaac]
module.vpc.aws_eip.nat_secondary["us-east-2b-0"]: Refreshing state... [id=eipalloc-055182abe5c634ddc]
module.vpc.aws_eip.nat_secondary["us-east-2b-2"]: Refreshing state... [id=eipalloc-0403ed9359182b72c]
module.vpc.aws_eip.nat_secondary["us-east-2b-4"]: Refreshing state... [id=eipalloc-08683a31d5967bff6]
module.vpc.aws_eip.nat_secondary["us-east-2a-1"]: Refreshing state... [id=eipalloc-0a90e8e5b75a3fe45]
module.vpc.aws_eip.nat_secondary["us-east-2c-3"]: Refreshing state... [id=eipalloc-04d97b3aec8f5fb8a]
module.vpc.aws_eip.nat_secondary["us-east-2a-4"]: Refreshing state... [id=eipalloc-0bd8a5e170892bb0b]
module.vpc.aws_eip.nat_secondary["us-east-2c-0"]: Refreshing state... [id=eipalloc-09bd4b74b1a8ca6ac]
module.vpc.aws_eip.nat_secondary["us-east-2b-3"]: Refreshing state... [id=eipalloc-0b95441aa4e161db2]
module.vpc.aws_eip.nat_secondary["us-east-2a-2"]: Refreshing state... [id=eipalloc-0e53d306d25151b0e]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0e0efd2a8ef20d72e]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-08c041a7cb9147705]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-079ed57d9de06fd9b]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-016d460df617c0e2c]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-080bfdf02da937445]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0e53846501278171e]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-06a9b2e4ea40968b6]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0515848329e5dc53a]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0ae8d251d3a0336ca]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-005f847cdca1f2143]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-028a6f03785f6bca2]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-0d0f31615161dab0f]
module.eks.aws_eks_cluster.this: Refreshing state... [id=lf-prod-aws-ue2]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0288194135c91a55d]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-095cd56cd812b4931]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-0caff297f1b93f0c7]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-0d0497dd1d2a111f5]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-0ce64842bfadf32b0]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-0d7230758d05b4f20]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-0feb6707491379e22]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0cbaf74e1bd57a865]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-05de05c204a439484]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=lf-prod-aws-ue2:vpc-cni]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=lf-prod-aws-ue2:kube-proxy]
module.eks.aws_eks_access_entry.cluster_admin["lf_osdc_admin"]: Refreshing state... [id=lf-prod-aws-ue2:arn:aws:iam::391835788720:role/lf_osdc_admin]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_launch_template.base: Refreshing state... [id=lt-062d0b42e1b1ca1af]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=033a163afb2babc26f7883e642621ac361c93d61]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::391835788720:oidc-provider/oidc.eks.us-east-2.amazonaws.com/id/43EEAC690CC76E15781134A4FC06EDCE]
module.eks.aws_eks_node_group.base: Refreshing state... [id=lf-prod-aws-ue2:lf-prod-aws-ue2-base-nodes]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=796338164]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=lf-prod-aws-ue2-harbor-registry]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=lf-prod-aws-ue2-ebs-csi-driver-role]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=lf-prod-aws-ue2-harbor-registry/arn:aws:iam::391835788720:policy/lf-prod-aws-ue2-harbor-registry]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=lf-prod-aws-ue2-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=lf-prod-aws-ue2:aws-ebs-csi-driver]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=lf-prod-aws-ue2:coredns]
module.eks.aws_eks_access_policy_association.cluster_admin["lf_osdc_admin"]: Refreshing state... [id=lf-prod-aws-ue2#arn:aws:iam::391835788720:role/lf_osdc_admin#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (lf-prod-aws-ue2) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=lf-prod-aws-ue2-karpenter-instance-state-change]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=lf-prod-aws-ue2-karpenter-rebalance]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=lf-prod-aws-ue2-karpenter-scheduled-change]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=lf-prod-aws-ue2-karpenter-spot-interruption]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/391835788720/lf-prod-aws-ue2-karpenter]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/391835788720/lf-prod-aws-ue2-karpenter]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=lf-prod-aws-ue2-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=lf-prod-aws-ue2-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=lf-prod-aws-ue2-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=lf-prod-aws-ue2-karpenter-rebalance-KarpenterRebalance]
data.terraform_remote_state.base: Read complete after 2s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-06c1f2ed8ffb1ddfa,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=lf-prod-aws-ue2-karpenter-controller]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue2-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0ae8d251d3a0336ca"]: Refreshing state... [id=subnet-0ae8d251d3a0336ca,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0515848329e5dc53a"]: Refreshing state... [id=subnet-0515848329e5dc53a,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-06a9b2e4ea40968b6"]: Refreshing state... [id=subnet-06a9b2e4ea40968b6,karpenter.sh/discovery]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=lf-prod-aws-ue2-karpenter-controller-20260608235145776400000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

huydhn added 2 commits June 10, 2026 16:15

Update (base update)

c307166

[ghstack-poisoned]

Update

3b333a4

[ghstack-poisoned]

huydhn requested a review from jeanschmidt as a code owner June 10, 2026 23:15

huydhn had a problem deploying to osdc-staging June 10, 2026 23:15 — with GitHub Actions Error

huydhn mentioned this pull request Jun 10, 2026

BuildKit autoscaling on staging: in-cluster KEDA + LB queue + warm baseline #723

Merged

huydhn had a problem deploying to osdc-staging June 10, 2026 23:15 — with GitHub Actions Error

This was referenced Jun 10, 2026

buildkit: enable autoscaling on prod (arc-cbr-production) #724

Merged

integration-tests: add buildkit autoscaling scale test #725

Merged

monitoring: scrape KEDA operator metrics #726

Merged

huydhn temporarily deployed to osdc-staging June 10, 2026 23:15 — with GitHub Actions Inactive

huydhn mentioned this pull request Jun 10, 2026

monitoring: add buildkit autoscaling alerts #727

Open

huydhn added 2 commits June 10, 2026 16:23

Update (base update)

49a07a4

[ghstack-poisoned]

Update

96d2bfa

[ghstack-poisoned]

huydhn temporarily deployed to osdc-staging June 10, 2026 23:23 — with GitHub Actions Inactive

huydhn had a problem deploying to osdc-staging June 10, 2026 23:24 — with GitHub Actions Error

huydhn temporarily deployed to osdc-staging June 10, 2026 23:25 — with GitHub Actions Inactive

huydhn mentioned this pull request Jun 10, 2026

docker-builds: retry remote BuildKit connect on the autoscaled pool pytorch/pytorch#186955

Closed

jeanschmidt approved these changes Jun 11, 2026

View reviewed changes

huydhn added 2 commits June 10, 2026 17:57

Update (base update)

6659ea9

[ghstack-poisoned]

Update

d45a64e

[ghstack-poisoned]

huydhn temporarily deployed to osdc-staging June 11, 2026 00:57 — with GitHub Actions Inactive

huydhn temporarily deployed to osdc-staging June 11, 2026 00:58 — with GitHub Actions Inactive

huydhn temporarily deployed to osdc-staging June 11, 2026 01:00 — with GitHub Actions Inactive

Update (base update)

e1df282

[ghstack-poisoned]

Update

4fc4d64

[ghstack-poisoned]

huydhn temporarily deployed to osdc-staging June 11, 2026 22:18 — with GitHub Actions Inactive

huydhn temporarily deployed to osdc-staging June 11, 2026 22:20 — with GitHub Actions Inactive

huydhn temporarily deployed to osdc-staging June 11, 2026 22:22 — with GitHub Actions Inactive

huydhn added 2 commits June 12, 2026 02:20

Update (base update)

dbee5e7

[ghstack-poisoned]

Update

56151ee

[ghstack-poisoned]

huydhn temporarily deployed to osdc-staging June 12, 2026 09:20 — with GitHub Actions Inactive

huydhn temporarily deployed to osdc-staging June 12, 2026 09:21 — with GitHub Actions Inactive

huydhn temporarily deployed to osdc-staging June 12, 2026 09:23 — with GitHub Actions Inactive

zxiiro approved these changes Jun 18, 2026

View reviewed changes

huydhn changed the base branch from gh/huydhn/38/base to main June 18, 2026 20:33

huydhn changed the base branch from main to gh/huydhn/38/base June 18, 2026 20:33

huydhn added 2 commits June 18, 2026 13:58

Update (base update)

ae3cb5d

[ghstack-poisoned]

Update

152cda3

[ghstack-poisoned]

huydhn temporarily deployed to osdc-staging June 18, 2026 20:58 — with GitHub Actions Inactive

huydhn temporarily deployed to osdc-staging June 18, 2026 21:00 — with GitHub Actions Inactive

huydhn temporarily deployed to osdc-staging June 18, 2026 21:01 — with GitHub Actions Inactive

huydhn changed the base branch from gh/huydhn/38/base to main June 18, 2026 21:02

huydhn temporarily deployed to osdc-staging June 18, 2026 21:02 — with GitHub Actions Inactive

huydhn temporarily deployed to osdc-staging June 18, 2026 21:03 — with GitHub Actions Inactive

huydhn added this pull request to the merge queue Jun 18, 2026

Merged via the queue into main with commit ab1ac4b Jun 18, 2026
12 checks passed

huydhn deleted the gh/huydhn/38/head branch June 18, 2026 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buildkit: deploy on meta-prod-aws-ue1 (us-east-1)#728

buildkit: deploy on meta-prod-aws-ue1 (us-east-1)#728
huydhn merged 18 commits into
mainfrom
gh/huydhn/38/head

huydhn commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

huydhn commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

huydhn commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

huydhn commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tofu plan — meta-prod-aws-ue1

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tofu plan — arc-cbr-production

Uh oh!

huydhn commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tofu plan — arc-cbr-production-uw1

Uh oh!

huydhn commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

tofu plan — lf-prod-aws-ue1

Uh oh!

github-actions Bot commented Jun 18, 2026

tofu plan — lf-prod-aws-ue2

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huydhn commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading