Skip to content

[RFC] surface memory control knobs for nodepools, needed for static memory manager#791

Draft
georgehong wants to merge 6 commits into
mainfrom
numa-memory-manager-rfc
Draft

[RFC] surface memory control knobs for nodepools, needed for static memory manager#791
georgehong wants to merge 6 commits into
mainfrom
numa-memory-manager-rfc

Conversation

@georgehong

@georgehong georgehong commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

(WIP) RFC

Context

Surfaces the ability for nodepools to enable memoryManagerPolicy: Static which NUMA-aware scheduling needs. Requested resources for a runner (GPU, CPU, and Memory) need to be enumerated, and CPU is already treated this way in the node config. Not having this means scheduler will return cannot align pod/pending state.

Memory management is also coupled with needing the specification of reservedMemory, which has to be precise to the byte to account for pod overhead for kubelet. Syntax is given by the following source:

reservedMemory:
  - numaNode: 0 # NUMA node index
    limits:
      memory: "1Gi" # byte quantity
  - numaNode: 1
    limits:
      memory: "2Gi" # byte quantity

For our cases, memory limits would likely be symmetrical.

Formula provided at https://kubernetes.io/docs/tasks/administer-cluster/memory-manager/:
Screenshot 2026-06-17 at 5 40 36 PM

This can be worked out for AWS nodes. In the p4d/A100 example, we can observe there are multiple sources where this is present:

+++ b/osdc/modules/nodepools/defs/p4d.yaml
+      kube_reserved_memory: 8500Mi
+      system_reserved_memory: "0"
+      eviction_hard_memory_available: 100Mi
+      reserved_memory:
+        - numa_node: 0
+          memory: 4300Mi
+        - numa_node: 1
+          memory: 4300Mi

max pods can also be deduced from running nodes and other configs, but it's important to avoid drift if AMI images change or ADM changes merge behavior of some of these node config parameters. Guardrails can be added in the form of alerts, if the image suddenly changes and node fails to start after X minutes.

Contents

  • Adds ability to specify reserved_memory block and associated parameters.
  • just command for materializing the node config changes.

Risks

Testing

RFC for team discussion — adds the MECHANISM only; no fleet is enabled.
The p4d enablement (defs/p4d.yaml) is intentionally NOT included here so we
can agree on the approach before flipping a scarce-capacity fleet.

## Why

Under topologyManagerPolicy=single-numa-node + scope=pod, scheduler-plugins
NodeResourceTopologyMatch ANDs the per-NUMA bitmask of every requested native
resource. Today GPU nodes run memoryManagerPolicy=None, so the kubelet never
publishes per-NUMA memory into the NodeResourceTopology (NRT). A Guaranteed GPU
pod that requests memory therefore ANDs an empty bitmask -> `cannot align pod`,
permanently Pending. (This is "Bug #2" of the NUMA E2E; "Bug #1" = the NFD
topology-updater snapshot race, tracked separately.)

Fix: enable kubelet Memory Manager Static on single-numa-node GPU fleets so
memory becomes a NUMA-tracked resource the scheduler can align.

## What this commit adds (generate_nodepools.py)

- `_parse_mem_quantity()` — exact byte parsing of K8s quantities (Ki/Mi/Gi/Ti).
- `_memory_manager_block()` — emits the kubelet.config Memory Manager block into
  EC2NodeClass userData, ONLY when a def opts in, GATED to
  topology_manager_policy == single-numa-node (raises otherwise), and VALIDATES
  the kubelet boot-gate invariant at generation time:
      sum(reservedMemory) == kubeReserved.memory + systemReserved.memory
                             + evictionHard.memory.available
  A mismatch fails `just test` / generation instead of bricking a node's boot
  (kubelet refuses to start if the sum is wrong -> fresh node never joins).
- Pass-through of the new per-def keys for fleet-format defs.
- TestMemoryManager (9 cases): gated emit, boot-gate validation, mixed units,
  misconfig rejection, fleet pass-through. Uses synthetic defs, so it needs no
  real-def change; the existing real-def round-trip still passes against the
  unmodified p4d.yaml.

No generated output changes (0 of 75 fleets opt in) until a def is enabled.

## What enablement WOULD look like (NOT in this commit — for discussion)

defs/p4d.yaml, under the p4d.24xlarge instance (fully-pinned variant so the
boot-gate sum is immune to EKS maxPods/AMI formula drift):

    memory_manager_policy: Static
    kube_reserved_memory: 8500Mi          # EKS floor 8362 (=11Mi*737+255) + headroom
    system_reserved_memory: "0"
    eviction_hard_memory_available: 100Mi
    reserved_memory:                      # must total 8500 + 0 + 100 = 8600Mi
      - { numa_node: 0, memory: 4300Mi }
      - { numa_node: 1, memory: 4300Mi }

Open questions for the team:
- Pinned (above) vs exact-match (mirror EKS's formula-derived kubeReserved and
  leave it owned by EKS)? Pinned is drift-proof but we own the number.
- Roll mechanics: memoryManagerPolicy is boot-only; GPU fleets have
  disruption_budget=0, so enabling requires a manual node roll per fleet.
- Per-fleet numbers differ (p5/p6 have different maxPods -> different kubeReserved).

## Validation (done on staging, fully-pinned p4d)

Rolled one fresh p4d in meta-staging-aws-ue1 and validated E2E with a real
pytorch-canary GPU job: node reached Ready (boot gate correct), nodeadm
deep-merge preserved EKS sibling reservations, NRT gained a memory zone, and the
GPU workflow pod aligned cpu+memory+gpu via numa-scheduler with no `cannot
align`. Negative tests (over-one-zone memory / GPU) were correctly refused.

Still TODO before any enablement ships: the Bug #1 fix (topology-updater
ordering + GPU-aware wait-for-nrt.py) must land alongside it.
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown

tofu plan — arc-cbr-production

✅ Plan succeeded · commit b69897c0 · run log

Plan output
Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


data.aws_availability_zones.available: Reading...
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0e712dc7e743bbcf7]
module.eks.data.aws_caller_identity.current: Reading...
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3]
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=527854a4-e335-4f95-bc89-1321cff7a478]
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-node-role]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNOLQFN6MU]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-eks-secrets]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-node-role:pytorch-arc-cbr-production-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-2]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-05e96ee7cb818e5c0]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-032d4401e63f0c9b9]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 0s [id=ami-009f1fe7d56695348]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-0577a02acde719bff]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0709abbcafa23aec0]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0fddf2f74e7e978c7]
module.vpc.aws_eip.nat_secondary["us-east-2b-4"]: Refreshing state... [id=eipalloc-0de33181548ac2e5a]
module.vpc.aws_eip.nat_secondary["us-east-2a-6"]: Refreshing state... [id=eipalloc-0113c95dbdec2f879]
module.vpc.aws_eip.nat_secondary["us-east-2a-5"]: Refreshing state... [id=eipalloc-0bd9bf54bd6010323]
module.vpc.aws_eip.nat_secondary["us-east-2b-3"]: Refreshing state... [id=eipalloc-021ee6c9f1d20b71a]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0992f582e9bf2836e]
module.vpc.aws_eip.nat_secondary["us-east-2a-4"]: Refreshing state... [id=eipalloc-067d535102a61d1a8]
module.vpc.aws_eip.nat_secondary["us-east-2b-5"]: Refreshing state... [id=eipalloc-0cde9a6463901f1e1]
module.vpc.aws_eip.nat_secondary["us-east-2a-3"]: Refreshing state... [id=eipalloc-034d5e1f5a2fcb795]
module.vpc.aws_eip.nat_secondary["us-east-2c-6"]: Refreshing state... [id=eipalloc-0aede78edc69cf695]
module.vpc.aws_eip.nat_secondary["us-east-2c-0"]: Refreshing state... [id=eipalloc-03542e74755fc105b]
module.vpc.aws_eip.nat_secondary["us-east-2b-6"]: Refreshing state... [id=eipalloc-06b7b88826199a232]
module.vpc.aws_eip.nat_secondary["us-east-2c-1"]: Refreshing state... [id=eipalloc-06a980076e99cda81]
module.vpc.aws_eip.nat_secondary["us-east-2c-4"]: Refreshing state... [id=eipalloc-0cc3dadec18bbb3f3]
module.vpc.aws_eip.nat_secondary["us-east-2b-2"]: Refreshing state... [id=eipalloc-063bee447616351f9]
module.vpc.aws_eip.nat_secondary["us-east-2b-1"]: Refreshing state... [id=eipalloc-0e67c0a8cd8c990da]
module.vpc.aws_eip.nat_secondary["us-east-2c-2"]: Refreshing state... [id=eipalloc-07cfdb2fd5dc07459]
module.vpc.aws_eip.nat_secondary["us-east-2a-1"]: Refreshing state... [id=eipalloc-0f2b00a9ac31df215]
module.vpc.aws_eip.nat_secondary["us-east-2c-3"]: Refreshing state... [id=eipalloc-0d3a71569b2f687be]
module.vpc.aws_eip.nat_secondary["us-east-2c-5"]: Refreshing state... [id=eipalloc-02825435a2786b3d8]
module.vpc.aws_eip.nat_secondary["us-east-2b-0"]: Refreshing state... [id=eipalloc-0cead990d60ce181e]
module.vpc.aws_eip.nat_secondary["us-east-2a-2"]: Refreshing state... [id=eipalloc-09b15a770e0c6d552]
module.vpc.aws_eip.nat_secondary["us-east-2a-0"]: Refreshing state... [id=eipalloc-086a011b3c26c0dd7]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-01e479dcb5aedf696]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-0a583bbbcac436ebd]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-01187bfaa68514400]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0d26e280575e8aaf4]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-0d34063a19f4b07b4]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ab11fcdb8d4ea113]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-07d5cd4c479c827ab]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-084975a7f7af2696e]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-0ce4fba002d90e7d5]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-0f7b8f4473e5790df]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0ad75b2f5282877db]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-08e264cbbd47be1ee]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-0cb3785c433ed7718]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-0c7ecd4166a01e5f0]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01d38d41a7ca82a08]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production:kube-proxy]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production:vpc-cni]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_launch_template.base: Refreshing state... [id=lt-0b820cd15307b6d57]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-097abe4676c74f71b]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0beb143017359bda1]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-0b6e08b4b0dc968c0]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production:pytorch-arc-cbr-production-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 1s [id=033a163afb2babc26f7883e642621ac361c93d61]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-2.amazonaws.com/id/0A621339248958D6D5F2FF084BD185B5]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=2879363015]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production:coredns]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption-KarpenterSpotInterruption]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.subnet_karpenter_discovery["subnet-0577a02acde719bff"]: Refreshing state... [id=subnet-0577a02acde719bff,karpenter.sh/discovery]
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-01ec5f742ae028981,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0709abbcafa23aec0"]: Refreshing state... [id=subnet-0709abbcafa23aec0,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0992f582e9bf2836e"]: Refreshing state... [id=subnet-0992f582e9bf2836e,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-karpenter-controller]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller-20260518021844404100000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown

tofu plan — arc-cbr-production-uw1

✅ Plan succeeded · commit b69897c0 · run log

Plan output
Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod-uw1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production-uw1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.data.aws_caller_identity.current: Reading...
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=1fb5d763-c5cd-4de5-bf40-712df992288c]
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role]
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
data.aws_availability_zones.available: Reading...
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0121d1038d393182a]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3]
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNFWBLKNFS]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role:pytorch-arc-cbr-production-uw1-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
data.aws_availability_zones.available: Read complete after 1s [id=us-west-1]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-uw1-eks-secrets]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-07fd8394a1d58b614]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-0b3b22b995e71d8d9]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-07b06397ce403fa53]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ce35bb011df0cfdb]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0bd275a35f8e7ef65]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0a8410ffa0f0014a7]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-06d137da3460167c4]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-08861bee27120b994]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0a13e7b49c841e497]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-05f5edbf2c6678c03]
module.vpc.aws_eip.nat_secondary["us-west-1a-4"]: Refreshing state... [id=eipalloc-0dfae88698dce850e]
module.vpc.aws_eip.nat_secondary["us-west-1a-3"]: Refreshing state... [id=eipalloc-05a2bad636af56f4d]
module.vpc.aws_eip.nat_secondary["us-west-1c-0"]: Refreshing state... [id=eipalloc-0d565f5bf077b05cf]
module.vpc.aws_eip.nat_secondary["us-west-1a-1"]: Refreshing state... [id=eipalloc-012ac413772344fea]
module.vpc.aws_eip.nat_secondary["us-west-1a-5"]: Refreshing state... [id=eipalloc-059986f686b188dc2]
module.vpc.aws_eip.nat_secondary["us-west-1a-0"]: Refreshing state... [id=eipalloc-0e3ca79e34012a238]
module.vpc.aws_eip.nat_secondary["us-west-1c-1"]: Refreshing state... [id=eipalloc-0bd09c7f2dcaa0a46]
module.vpc.aws_eip.nat_secondary["us-west-1c-4"]: Refreshing state... [id=eipalloc-0dfaa16c61333ceb3]
module.vpc.aws_eip.nat_secondary["us-west-1c-5"]: Refreshing state... [id=eipalloc-0635efedc10ee5f66]
module.vpc.aws_eip.nat_secondary["us-west-1a-6"]: Refreshing state... [id=eipalloc-08763a35db0a26caa]
module.vpc.aws_eip.nat_secondary["us-west-1c-2"]: Refreshing state... [id=eipalloc-0f2e15b6a36b52fac]
module.vpc.aws_eip.nat_secondary["us-west-1c-3"]: Refreshing state... [id=eipalloc-09f89978685e7f3c7]
module.vpc.aws_eip.nat_secondary["us-west-1a-2"]: Refreshing state... [id=eipalloc-0647e169131be5893]
module.vpc.aws_eip.nat_secondary["us-west-1c-6"]: Refreshing state... [id=eipalloc-0cf91a032d10f4ec5]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-00184fa8d73e575c9]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0f79a2ac72857a304]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3-20260519191031756900000001]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production-uw1]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production-uw1:kube-proxy]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production-uw1:vpc-cni]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-066ae5f473a2b07c0]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production-uw1:pytorch-arc-cbr-production-uw1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=ab5db6c82031e2d229412c67921160a3b3af073b]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-west-1.amazonaws.com/id/ED52EC64FF5CFAB4151C6E4B5DE279BD]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=3969145930]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0c336634317cc9f35]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-01ec520e3931f5f6a]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production-uw1:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-ebs-csi-driver]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01165f36472c0a780]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-06e17b37b87d890f2]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0cc835aef3e3bcc21]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-02e4c54e5fa3b4f8a]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance]
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change-KarpenterInstanceStateChange]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.subnet_karpenter_discovery["subnet-08861bee27120b994"]: Refreshing state... [id=subnet-08861bee27120b994,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0a13e7b49c841e497"]: Refreshing state... [id=subnet-0a13e7b49c841e497,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-058909cc1cdc63fad,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller-20260519195229107000000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-s3]
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wants-collector-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-0da5eaf2022d80aa0]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role]
aws_iam_role.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role]
aws_security_group.efs: Refreshing state... [id=sg-01c1f3fa51705db76]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role-20260519200350781900000004]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role-20260519200350777100000003]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role-20260519200350826400000005]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-efs-csi-driver]
aws_efs_mount_target.pypi_cache["subnet-0a13e7b49c841e497"]: Refreshing state... [id=fsmt-089fd42858a5a85ab]
aws_efs_mount_target.pypi_cache["subnet-08861bee27120b994"]: Refreshing state... [id=fsmt-00708cc923d4d2055]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown

tofu plan — meta-prod-aws-ue1

✅ Plan succeeded · commit b69897c0 · run log

Plan output
Installed 1 package in 1ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod-ue1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (meta-prod-aws-ue1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.data.aws_caller_identity.current: Reading...
data.aws_availability_zones.available: Reading...
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.aws_iam_role.node: Refreshing state... [id=meta-prod-aws-ue1-node-role]
module.eks.aws_iam_role.cluster: Refreshing state... [id=meta-prod-aws-ue1-cluster-role]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-046818728dce02486]
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=meta-prod-aws-ue1-harbor-s3]
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=9274017b-776a-41bd-9f11-d118a1174159]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNGRUDTXPT]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/meta-prod-aws-ue1-eks-secrets]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=meta-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=meta-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=meta-prod-aws-ue1-node-role:meta-prod-aws-ue1-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-0dafeb02304897431]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=meta-prod-aws-ue1-harbor-s3/arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-0ce44cb6446f3c1b6]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-0cf3d9cf37ee998b6]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0348c5058db524cd2]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0beb5fc44f0ee165f]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-02ce11d6646870431]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0d65ec2dd49f0d87c]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0f922406e02ecba1d]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-033772b4490df1b41]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-078f44b58c8b48ade]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-07bfd0f170c3b3406]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0eafd792589fbb363]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-00c2e2605c4dea199]
module.vpc.aws_eip.nat_secondary["us-east-1a-4"]: Refreshing state... [id=eipalloc-09fa171393c3a7cfb]
module.vpc.aws_eip.nat_secondary["us-east-1a-5"]: Refreshing state... [id=eipalloc-01f89a7c130d2a810]
module.vpc.aws_eip.nat_secondary["us-east-1b-4"]: Refreshing state... [id=eipalloc-0aba12aa23c11d20c]
module.vpc.aws_eip.nat_secondary["us-east-1c-3"]: Refreshing state... [id=eipalloc-0af54aa2e5f40dfa4]
module.vpc.aws_eip.nat_secondary["us-east-1c-5"]: Refreshing state... [id=eipalloc-04fe645562f597aaa]
module.vpc.aws_eip.nat_secondary["us-east-1b-3"]: Refreshing state... [id=eipalloc-0c8291ee817240e1f]
module.vpc.aws_eip.nat_secondary["us-east-1a-6"]: Refreshing state... [id=eipalloc-02e84a51a14c9cbda]
module.vpc.aws_eip.nat_secondary["us-east-1b-1"]: Refreshing state... [id=eipalloc-0d095305019486ae6]
module.vpc.aws_eip.nat_secondary["us-east-1c-2"]: Refreshing state... [id=eipalloc-025ef0e1813277c67]
module.vpc.aws_eip.nat_secondary["us-east-1a-1"]: Refreshing state... [id=eipalloc-08c7bd3306cf687ca]
module.vpc.aws_eip.nat_secondary["us-east-1c-0"]: Refreshing state... [id=eipalloc-05844040c7248f44f]
module.vpc.aws_eip.nat_secondary["us-east-1b-0"]: Refreshing state... [id=eipalloc-0bcfe1f98793e1b12]
module.vpc.aws_eip.nat_secondary["us-east-1c-6"]: Refreshing state... [id=eipalloc-0d22d3aa0667a1070]
module.vpc.aws_eip.nat_secondary["us-east-1c-4"]: Refreshing state... [id=eipalloc-00c5df9f3b60f353d]
module.vpc.aws_eip.nat_secondary["us-east-1a-0"]: Refreshing state... [id=eipalloc-0c8a6faed0a97479d]
module.vpc.aws_eip.nat_secondary["us-east-1a-3"]: Refreshing state... [id=eipalloc-0bda13d7b70c00c00]
module.vpc.aws_eip.nat_secondary["us-east-1b-2"]: Refreshing state... [id=eipalloc-0f0b720f4cca62ec7]
module.vpc.aws_eip.nat_secondary["us-east-1b-6"]: Refreshing state... [id=eipalloc-0f922f499d32f1368]
module.vpc.aws_eip.nat_secondary["us-east-1c-1"]: Refreshing state... [id=eipalloc-0cb5208c5f775baf6]
module.vpc.aws_eip.nat_secondary["us-east-1a-2"]: Refreshing state... [id=eipalloc-080ec4e265ebdc5ad]
module.vpc.aws_eip.nat_secondary["us-east-1b-5"]: Refreshing state... [id=eipalloc-0d078dc6f07628714]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-05e7e66e960593972]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-05da47c4ed26ae390]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0616491b7baeab47f]
module.eks.aws_eks_cluster.this: Refreshing state... [id=meta-prod-aws-ue1]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-09414719983019b49]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-025de56c0aac8d3f0]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0cff785d8001fc914]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=meta-prod-aws-ue1:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=meta-prod-aws-ue1:kube-proxy]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=meta-prod-aws-ue1:vpc-cni]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-043779597e3b5a7fd]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-09287d705ce4a88bc]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-05d5b7a41aa6323ed]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-0c665948be8d0282e]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-09dca398d838d4247]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-02a8683fa7258f295]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-0306281246323bd27]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=b1b539daa206035ae3c3e28288b0681fa1b462f3]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/6C84A48E1BF23A027C1E78912A368743]
module.eks.aws_eks_node_group.base: Refreshing state... [id=meta-prod-aws-ue1:meta-prod-aws-ue1-base-nodes]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=3022997555]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=meta-prod-aws-ue1-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=meta-prod-aws-ue1#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=meta-prod-aws-ue1:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=meta-prod-aws-ue1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry/arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (meta-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=meta-prod-aws-ue1-karpenter-spot-interruption]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=meta-prod-aws-ue1-karpenter-rebalance]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-scheduled-change]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/308535385114/meta-prod-aws-ue1-karpenter]
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-instance-state-change]
data.terraform_remote_state.base: Read complete after 0s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-016f4a0d209f3e4a9,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=meta-prod-aws-ue1-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-02ce11d6646870431"]: Refreshing state... [id=subnet-02ce11d6646870431,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0d65ec2dd49f0d87c"]: Refreshing state... [id=subnet-0d65ec2dd49f0d87c,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0348c5058db524cd2"]: Refreshing state... [id=subnet-0348c5058db524cd2,karpenter.sh/discovery]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/308535385114/meta-prod-aws-ue1-karpenter]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-karpenter-controller]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=meta-prod-aws-ue1-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=meta-prod-aws-ue1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=meta-prod-aws-ue1-karpenter-controller-20260528200455768400000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (meta-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
data.terraform_remote_state.base: Read complete after 0s
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-pypi-wheel-syncer-s3]
aws_iam_role.wants_collector: Refreshing state... [id=meta-prod-aws-ue1-pypi-wants-collector-role]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-023e57b36ec1cd426]
aws_security_group.efs: Refreshing state... [id=sg-0bc06caa62214c9b7]
aws_iam_role.wheel_syncer: Refreshing state... [id=meta-prod-aws-ue1-pypi-wheel-syncer-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1-efs-csi-driver-role]
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-pypi-wants-collector-s3]
aws_efs_mount_target.pypi_cache["subnet-0348c5058db524cd2"]: Refreshing state... [id=fsmt-0500c573cafe66133]
aws_efs_mount_target.pypi_cache["subnet-02ce11d6646870431"]: Refreshing state... [id=fsmt-06a05c001541338d2]
aws_efs_mount_target.pypi_cache["subnet-0d65ec2dd49f0d87c"]: Refreshing state... [id=fsmt-0ffaedc58eceb7749]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=meta-prod-aws-ue1-pypi-wheel-syncer-role-20260528201106257700000005]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=meta-prod-aws-ue1-pypi-wants-collector-role-20260528201106192600000004]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1-efs-csi-driver-role-20260528201106116400000003]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1:aws-efs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown

tofu plan — lf-prod-aws-ue1

✅ Plan succeeded · commit b69897c0 · run log

Plan output
Installed 1 package in 1ms
{
    "BucketArn": "arn:aws:s3:::lf-osdc-tfstate-prod-ue1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (lf-prod-aws-ue1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.aws_iam_role.node: Refreshing state... [id=lf-prod-aws-ue1-node-role]
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=lf-prod-aws-ue1-harbor-s3]
data.aws_availability_zones.available: Reading...
module.eks.aws_iam_role.cluster: Refreshing state... [id=lf-prod-aws-ue1-cluster-role]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-06f350eae88f37700]
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=e5e45db6-94ad-4dfd-8a1a-213730256a9c]
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.data.aws_caller_identity.current: Reading...
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=lf-prod-aws-ue1-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=391835788720]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAVWOZ3UWYJZNKMI7G]
data.aws_availability_zones.available: Read complete after 1s [id=us-east-1]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/lf-prod-aws-ue1-eks-secrets]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=lf-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=lf-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=lf-prod-aws-ue1-node-role:lf-prod-aws-ue1-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=lf-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=lf-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=lf-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=lf-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-0dafeb02304897431]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-089c5123e6da8d43c]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-03548aa6de237de4c]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-087f338c446cffe5d]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-0fe9cf01a4661b360]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0afe958a38da9f46c]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-0f16b1a5ecd405405]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-051217c40b1d02b3a]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-06e680510bc45584b]
module.vpc.aws_eip.nat_secondary["us-east-1a-4"]: Refreshing state... [id=eipalloc-0581e7f2f8194266e]
module.vpc.aws_eip.nat_secondary["us-east-1c-0"]: Refreshing state... [id=eipalloc-051a8792e8dad6c5b]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-0f7184dc74425b3ca]
module.vpc.aws_eip.nat_secondary["us-east-1c-6"]: Refreshing state... [id=eipalloc-088384cdd02d04bce]
module.vpc.aws_eip.nat_secondary["us-east-1c-5"]: Refreshing state... [id=eipalloc-07612f4e715e508ae]
module.vpc.aws_eip.nat_secondary["us-east-1b-0"]: Refreshing state... [id=eipalloc-0533e7098b8548fda]
module.vpc.aws_eip.nat_secondary["us-east-1b-5"]: Refreshing state... [id=eipalloc-00cd91c376a1f197d]
module.vpc.aws_eip.nat_secondary["us-east-1b-4"]: Refreshing state... [id=eipalloc-0bd28bd3991297f4a]
module.vpc.aws_eip.nat_secondary["us-east-1a-6"]: Refreshing state... [id=eipalloc-046fa83874bde66b5]
module.vpc.aws_eip.nat_secondary["us-east-1b-6"]: Refreshing state... [id=eipalloc-085e43aacdd3b5c5f]
module.vpc.aws_eip.nat_secondary["us-east-1a-3"]: Refreshing state... [id=eipalloc-01a2c9bf10e45099b]
module.vpc.aws_eip.nat_secondary["us-east-1c-1"]: Refreshing state... [id=eipalloc-01b2275a4f494fe58]
module.vpc.aws_eip.nat_secondary["us-east-1c-2"]: Refreshing state... [id=eipalloc-0a4874208e55dfb7b]
module.vpc.aws_eip.nat_secondary["us-east-1b-2"]: Refreshing state... [id=eipalloc-0c66936151cceca74]
module.vpc.aws_eip.nat_secondary["us-east-1c-4"]: Refreshing state... [id=eipalloc-0db3b4c44cbf47d5a]
module.vpc.aws_eip.nat_secondary["us-east-1a-5"]: Refreshing state... [id=eipalloc-0936080d9155a0306]
module.vpc.aws_eip.nat_secondary["us-east-1a-2"]: Refreshing state... [id=eipalloc-0625b097098b1ac2a]
module.vpc.aws_eip.nat_secondary["us-east-1c-3"]: Refreshing state... [id=eipalloc-03ec1105bba33668b]
module.vpc.aws_eip.nat_secondary["us-east-1b-3"]: Refreshing state... [id=eipalloc-042e254711d0d3dda]
module.vpc.aws_eip.nat_secondary["us-east-1b-1"]: Refreshing state... [id=eipalloc-0f13b0cd68a133531]
module.vpc.aws_eip.nat_secondary["us-east-1a-1"]: Refreshing state... [id=eipalloc-034348675ffacd849]
module.vpc.aws_eip.nat_secondary["us-east-1a-0"]: Refreshing state... [id=eipalloc-03131d115b478f7c3]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-01ca3df6137b445c0]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-07234379e7833a398]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0fa332056910f46b2]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=lf-prod-aws-ue1-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue1-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=lf-prod-aws-ue1-harbor-registry]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-025d5eab2da94f8e6]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0a6049a78a7428383]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-025956fd021d43094]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-0cf871626838e4133]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-0d0e9d964d1cd8a9e]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-09c6f66e50dce1835]
module.eks.aws_eks_cluster.this: Refreshing state... [id=lf-prod-aws-ue1]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=lf-prod-aws-ue1-harbor-s3/arn:aws:iam::391835788720:policy/lf-prod-aws-ue1-harbor-registry]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-0264db606b7f24bb6]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-02536bbe724eaaa2f]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-027811d9ba750a284]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-009170e5cc902aa3e]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-00cd583d71292870b]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-0c6e77db648d5279c]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=lf-prod-aws-ue1:vpc-cni]
module.eks.aws_eks_access_entry.cluster_admin["lf_osdc_admin"]: Refreshing state... [id=lf-prod-aws-ue1:arn:aws:iam::391835788720:role/lf_osdc_admin]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=lf-prod-aws-ue1:kube-proxy]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-0e3be05a985acc61c]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_node_group.base: Refreshing state... [id=lf-prod-aws-ue1:lf-prod-aws-ue1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=b1b539daa206035ae3c3e28288b0681fa1b462f3]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::391835788720:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/E8EF4A6C55DB9699E53A54DA444C21A3]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=lf-prod-aws-ue1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=717515682]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=lf-prod-aws-ue1-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["lf_osdc_admin"]: Refreshing state... [id=lf-prod-aws-ue1#arn:aws:iam::391835788720:role/lf_osdc_admin#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=lf-prod-aws-ue1:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=lf-prod-aws-ue1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=lf-prod-aws-ue1-harbor-registry/arn:aws:iam::391835788720:policy/lf-prod-aws-ue1-harbor-registry]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=lf-prod-aws-ue1:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (lf-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=lf-prod-aws-ue1-karpenter-scheduled-change]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=lf-prod-aws-ue1-karpenter-rebalance]
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=lf-prod-aws-ue1-karpenter-instance-state-change]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/391835788720/lf-prod-aws-ue1-karpenter]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=lf-prod-aws-ue1-karpenter-spot-interruption]
data.terraform_remote_state.base: Read complete after 0s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-07a769a5e8a93a444,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=lf-prod-aws-ue1-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-07234379e7833a398"]: Refreshing state... [id=subnet-07234379e7833a398,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-01ca3df6137b445c0"]: Refreshing state... [id=subnet-01ca3df6137b445c0,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0fa332056910f46b2"]: Refreshing state... [id=subnet-0fa332056910f46b2,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue1-karpenter-controller]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/391835788720/lf-prod-aws-ue1-karpenter]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=lf-prod-aws-ue1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=lf-prod-aws-ue1-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=lf-prod-aws-ue1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=lf-prod-aws-ue1-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=lf-prod-aws-ue1-karpenter-controller-20260605165913470400000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (lf-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue1-pypi-wheel-syncer-s3]
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue1-pypi-wants-collector-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-080ed1b100680046a]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.efs_csi_driver: Refreshing state... [id=lf-prod-aws-ue1-efs-csi-driver-role]
aws_iam_role.wheel_syncer: Refreshing state... [id=lf-prod-aws-ue1-pypi-wheel-syncer-role]
aws_iam_role.wants_collector: Refreshing state... [id=lf-prod-aws-ue1-pypi-wants-collector-role]
aws_security_group.efs: Refreshing state... [id=sg-043a534e51b4cf754]
aws_efs_mount_target.pypi_cache["subnet-07234379e7833a398"]: Refreshing state... [id=fsmt-0a587f0a4f8a480eb]
aws_efs_mount_target.pypi_cache["subnet-01ca3df6137b445c0"]: Refreshing state... [id=fsmt-015ba05b2befeb7f3]
aws_efs_mount_target.pypi_cache["subnet-0fa332056910f46b2"]: Refreshing state... [id=fsmt-0855738bbe8ec9699]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=lf-prod-aws-ue1-pypi-wheel-syncer-role-20260605174406029100000004]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=lf-prod-aws-ue1-pypi-wants-collector-role-20260605174405984800000003]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=lf-prod-aws-ue1-efs-csi-driver-role-20260605174406081600000005]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=lf-prod-aws-ue1:aws-efs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown

tofu plan — lf-prod-aws-ue2

✅ Plan succeeded · commit b69897c0 · run log

Plan output
Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::lf-osdc-tfstate-prod-ue2",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (lf-prod-aws-ue2) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


data.aws_availability_zones.available: Reading...
module.eks.aws_iam_role.cluster: Refreshing state... [id=lf-prod-aws-ue2-cluster-role]
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.aws_iam_role.node: Refreshing state... [id=lf-prod-aws-ue2-node-role]
module.eks.data.aws_caller_identity.current: Reading...
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=lf-prod-aws-ue2-harbor-s3]
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=27a9b8e9-2509-43ce-ac8e-cfc320b65fe2]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0f7d54e3accfbe3e4]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=lf-prod-aws-ue2-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=391835788720]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAVWOZ3UWYMGG4LIHB]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=lf-prod-aws-ue2-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=lf-prod-aws-ue2-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-2]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/lf-prod-aws-ue2-eks-secrets]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=lf-prod-aws-ue2-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=lf-prod-aws-ue2-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=lf-prod-aws-ue2-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=lf-prod-aws-ue2-node-role:lf-prod-aws-ue2-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=lf-prod-aws-ue2-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-009f1fe7d56695348]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue2-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=lf-prod-aws-ue2-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=lf-prod-aws-ue2-harbor-registry]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-042c4d31ed557eaa4]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-061f8f7ac8b40d720]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=lf-prod-aws-ue2-harbor-s3/arn:aws:iam::391835788720:policy/lf-prod-aws-ue2-harbor-registry]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0e53846501278171e]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-06a9b2e4ea40968b6]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-08c041a7cb9147705]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-016d460df617c0e2c]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0515848329e5dc53a]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0e0efd2a8ef20d72e]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-080bfdf02da937445]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-079ed57d9de06fd9b]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0ae8d251d3a0336ca]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0508ab6e3db7ccf08]
module.vpc.aws_eip.nat_secondary["us-east-2b-6"]: Refreshing state... [id=eipalloc-09c38605941dbbaac]
module.vpc.aws_eip.nat_secondary["us-east-2a-0"]: Refreshing state... [id=eipalloc-0a9078e90b80cc1de]
module.vpc.aws_eip.nat_secondary["us-east-2c-3"]: Refreshing state... [id=eipalloc-04d97b3aec8f5fb8a]
module.vpc.aws_eip.nat_secondary["us-east-2a-3"]: Refreshing state... [id=eipalloc-0737f1fdf35a0f975]
module.vpc.aws_eip.nat_secondary["us-east-2c-1"]: Refreshing state... [id=eipalloc-0241d507f34cdb0b5]
module.vpc.aws_eip.nat_secondary["us-east-2b-1"]: Refreshing state... [id=eipalloc-0c8d74e3dcfb2dad0]
module.vpc.aws_eip.nat_secondary["us-east-2b-2"]: Refreshing state... [id=eipalloc-0403ed9359182b72c]
module.vpc.aws_eip.nat_secondary["us-east-2b-0"]: Refreshing state... [id=eipalloc-055182abe5c634ddc]
module.vpc.aws_eip.nat_secondary["us-east-2c-2"]: Refreshing state... [id=eipalloc-08e66df79eddc18b5]
module.vpc.aws_eip.nat_secondary["us-east-2c-4"]: Refreshing state... [id=eipalloc-005e21ac878c4db34]
module.vpc.aws_eip.nat_secondary["us-east-2a-6"]: Refreshing state... [id=eipalloc-077cbc910a56d08fd]
module.vpc.aws_eip.nat_secondary["us-east-2c-6"]: Refreshing state... [id=eipalloc-057df768d859ed17e]
module.vpc.aws_eip.nat_secondary["us-east-2c-5"]: Refreshing state... [id=eipalloc-06c020042f283554a]
module.vpc.aws_eip.nat_secondary["us-east-2a-2"]: Refreshing state... [id=eipalloc-0e53d306d25151b0e]
module.vpc.aws_eip.nat_secondary["us-east-2b-3"]: Refreshing state... [id=eipalloc-0b95441aa4e161db2]
module.vpc.aws_eip.nat_secondary["us-east-2a-4"]: Refreshing state... [id=eipalloc-0bd8a5e170892bb0b]
module.vpc.aws_eip.nat_secondary["us-east-2a-1"]: Refreshing state... [id=eipalloc-0a90e8e5b75a3fe45]
module.vpc.aws_eip.nat_secondary["us-east-2c-0"]: Refreshing state... [id=eipalloc-09bd4b74b1a8ca6ac]
module.vpc.aws_eip.nat_secondary["us-east-2b-5"]: Refreshing state... [id=eipalloc-06f0755f7542d77fa]
module.vpc.aws_eip.nat_secondary["us-east-2a-5"]: Refreshing state... [id=eipalloc-095865342b4c692ac]
module.vpc.aws_eip.nat_secondary["us-east-2b-4"]: Refreshing state... [id=eipalloc-08683a31d5967bff6]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-005f847cdca1f2143]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-0d0f31615161dab0f]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-028a6f03785f6bca2]
module.eks.aws_eks_cluster.this: Refreshing state... [id=lf-prod-aws-ue2]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-0caff297f1b93f0c7]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-095cd56cd812b4931]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0288194135c91a55d]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-0d7230758d05b4f20]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-0ce64842bfadf32b0]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-0d0497dd1d2a111f5]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=lf-prod-aws-ue2:vpc-cni]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=lf-prod-aws-ue2:kube-proxy]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_access_entry.cluster_admin["lf_osdc_admin"]: Refreshing state... [id=lf-prod-aws-ue2:arn:aws:iam::391835788720:role/lf_osdc_admin]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-062d0b42e1b1ca1af]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-05de05c204a439484]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-0feb6707491379e22]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0cbaf74e1bd57a865]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=033a163afb2babc26f7883e642621ac361c93d61]
module.eks.aws_eks_node_group.base: Refreshing state... [id=lf-prod-aws-ue2:lf-prod-aws-ue2-base-nodes]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::391835788720:oidc-provider/oidc.eks.us-east-2.amazonaws.com/id/43EEAC690CC76E15781134A4FC06EDCE]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=lf-prod-aws-ue2-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=796338164]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=lf-prod-aws-ue2-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["lf_osdc_admin"]: Refreshing state... [id=lf-prod-aws-ue2#arn:aws:iam::391835788720:role/lf_osdc_admin#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=lf-prod-aws-ue2:coredns]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=lf-prod-aws-ue2-harbor-registry/arn:aws:iam::391835788720:policy/lf-prod-aws-ue2-harbor-registry]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=lf-prod-aws-ue2-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=lf-prod-aws-ue2:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (lf-prod-aws-ue2) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=lf-prod-aws-ue2-karpenter-instance-state-change]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=lf-prod-aws-ue2-karpenter-spot-interruption]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=lf-prod-aws-ue2-karpenter-scheduled-change]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=lf-prod-aws-ue2-karpenter-rebalance]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/391835788720/lf-prod-aws-ue2-karpenter]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/391835788720/lf-prod-aws-ue2-karpenter]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=lf-prod-aws-ue2-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=lf-prod-aws-ue2-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=lf-prod-aws-ue2-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=lf-prod-aws-ue2-karpenter-spot-interruption-KarpenterSpotInterruption]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-06c1f2ed8ffb1ddfa,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=lf-prod-aws-ue2-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0515848329e5dc53a"]: Refreshing state... [id=subnet-0515848329e5dc53a,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-06a9b2e4ea40968b6"]: Refreshing state... [id=subnet-06a9b2e4ea40968b6,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue2-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0ae8d251d3a0336ca"]: Refreshing state... [id=subnet-0ae8d251d3a0336ca,karpenter.sh/discovery]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=lf-prod-aws-ue2-karpenter-controller-20260608235145776400000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (lf-prod-aws-ue2) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue2-pypi-wheel-syncer-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-066c520a3fe657aba]
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::391835788720:policy/lf-prod-aws-ue2-pypi-wants-collector-s3]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.efs_csi_driver: Refreshing state... [id=lf-prod-aws-ue2-efs-csi-driver-role]
aws_iam_role.wheel_syncer: Refreshing state... [id=lf-prod-aws-ue2-pypi-wheel-syncer-role]
aws_iam_role.wants_collector: Refreshing state... [id=lf-prod-aws-ue2-pypi-wants-collector-role]
aws_security_group.efs: Refreshing state... [id=sg-02463b5dfca9d70f7]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=lf-prod-aws-ue2-pypi-wheel-syncer-role-20260609154828857900000004]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=lf-prod-aws-ue2-efs-csi-driver-role-20260609154828863300000005]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=lf-prod-aws-ue2-pypi-wants-collector-role-20260609154828857700000003]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=lf-prod-aws-ue2:aws-efs-csi-driver]
aws_efs_mount_target.pypi_cache["subnet-06a9b2e4ea40968b6"]: Refreshing state... [id=fsmt-0e394b1aa15541bde]
aws_efs_mount_target.pypi_cache["subnet-0ae8d251d3a0336ca"]: Refreshing state... [id=fsmt-0d69f035478e5eb52]
aws_efs_mount_target.pypi_cache["subnet-0515848329e5dc53a"]: Refreshing state... [id=fsmt-033866be4ab382a70]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

…ntities.py

Addresses RFC review: the boot-gate validator added a 6th private copy of a
K8s memory-quantity parser (`_parse_mem_quantity`) that was also untested and
less accurate than the existing one (it rejected valid decimal-SI suffixes
like `8G`/`500M`).

Hoist the better implementation (arc-runners' `parse_memory_bytes`, which
already handled binary + decimal SI) into the shared `scripts/python/` —
where `generate_nodepools.py` already imports `instance_specs`/`nodepool_defs`
from, so no cross-module dependency — and give it direct unit tests
(`test_quantities.py`). Both `generate_nodepools.py` and `generate_runners.py`
now import the single shared `parse_memory_bytes`; the local copies are gone.

Net: one tested, decimal-SI-correct parser instead of two (and a path to
retire the remaining copies in daemonset_overhead/analyze_node_utilization/
node-compactor as follow-up). The gate validator inherits real coverage.
…ession test

Addresses RFC review: memoryManagerPolicy and topologyManagerPolicy are
independent kubelet knobs, and the kubelet accepts Static under any topology
policy. The previous hard requirement (Static => single-numa-node, else raise)
conflated two settings the API keeps separate.

- Static no longer requires single-numa-node. It still ALWAYS validates the
  reservedMemory boot gate (a real kubelet requirement, independent of
  topology). When enabled without single-numa-node it now WARNS (stderr) rather
  than raising — that combo is a likely misconfig (memory is NUMA-reserved but
  the scheduler won't align it) but is valid and the operator's call.
- Add test_no_keys_leaves_config_untouched: a def WITHOUT any Memory Manager
  keys emits the exact pre-feature kubelet.config — exact key set, and no
  blank-line artifact at the empty-block injection point. Guards every existing
  nodepool def against drift from this feature.
- Replace the hard-raise test with test_static_without_single_numa_warns_but_emits.
@georgehong georgehong changed the title [RFC] nodepools: add kubelet Memory Manager Static support (#696) [RFC] surface memory control knobs for nodepools, needed for static memory manager Jun 17, 2026
Addresses RFC review: kubeReserved / systemReserved / evictionHard are GENERAL
kubelet settings (they change node-allocatable memory and the eviction
threshold, and EKS already sets them by default) — they are NOT specific to
Memory Manager Static. The previous code only plumbed them through the Static
block and rejected them otherwise, conflating independent knobs.

Rework `_memory_manager_block` -> `_kubelet_memory_block`: each knob is emitted
into kubelet.config ONLY when the def specifies it, and propagates on its own:
  - memory_manager_policy          -> memoryManagerPolicy
  - kube_reserved_memory           -> kubeReserved.memory
  - system_reserved_memory         -> systemReserved.memory
  - eviction_hard_memory_available -> evictionHard.memory.available
  - reserved_memory                -> reservedMemory  (still Static-only — it is
                                      consumed solely by the Memory Manager)

Boot gate: validated only when all three reservation terms are pinned alongside
Static (self-contained); if any is left to the EKS default — unknown at
generation — warn that the gate is unvalidated rather than guess. Static still
requires reserved_memory; reserved_memory without Static still errors.

Tests: independent-knob emission (kubeReserved/evictionHard/systemReserved
alone), "only specified blocks included", unpinned-terms warn-but-emit, Static-
without-reserved raises. Fully-pinned p4d output is byte-identical to before.
Addresses RFC review (function too large, mixed concerns). Matches the file's
convention of dedicated validators (_validate_fleet,
_validate_startup_taints_registry) kept separate from the thin YAML emitters.

- _validate_kubelet_memory(nodepool_def, topology_policy): all the checks +
  warnings + the boot-gate sum (uses early-return to flatten nesting).
- _kubelet_memory_block(nodepool_def): pure 1:1 translation — one block per key
  present, no validation. The three same-shaped knobs
  (kubeReserved/systemReserved/evictionHard) are driven by a small table
  (_KUBELET_MEMORY_RESERVATIONS); memoryManagerPolicy and reservedMemory are the
  two special shapes.

generate_nodepool_yaml now calls validate-then-translate. No behavior change:
generated output is byte-identical (verified on the fully-pinned p4d block) and
all tests pass unchanged.
Add `just render-nodepool <def> [modules]` — renders a single nodepool
def to stdout (generation only, no cluster required) so operators can
preview the generated userData / kubelet config before deploying.

Document the nodeadm deep-merge assumption in generate_nodepools.py:
setting e.g. kubeReserved.memory relies on nodeadm preserving the
EKS-default sibling keys (cpu, ephemeral-storage, nodefs.*). This was
empirically validated on a real p4d (2026-06-17) but is an
implementation detail, not a contractual guarantee. The comment
describes the re-verification procedure and fallback mitigation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant