Skip to content

[TEST-ONLY] Enable NUMA modules on arc-staging with g4dn.metal (T4)#748

Draft
georgehong wants to merge 5 commits into
gh/georgehong/11/basefrom
gh/georgehong/11/head
Draft

[TEST-ONLY] Enable NUMA modules on arc-staging with g4dn.metal (T4)#748
georgehong wants to merge 5 commits into
gh/georgehong/11/basefrom
gh/georgehong/11/head

Conversation

@georgehong

@georgehong georgehong commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Stack from ghstack (oldest at bottom):

Temporary test configuration — DO NOT MERGE. Parallel to the A100 [TEST-ONLY]
commit: the same NUMA pipeline, validated on g4dn.metal (T4) because AWS does
not offer A100/p4d in us-west-1 (arc-staging's region), whereas g4dn.metal is
available on-demand there.

  • Add nfd + numa-scheduler to arc-staging modules; remove from prod clusters
  • Point NFD topology-updater + taint-remover nodeSelector at g4dn-metal-numa
  • Gate the nfd-topology startup taint on the g4dn-metal-numa fleet
  • Add g4dn-metal-numa nodepool: single-numa-node, capped to ONE node
    (limits.nvidia.com/gpu: 8) so 1-GPU + 4-GPU runners pack one 2-NUMA box
  • Add a nodepool limits passthrough to generate_nodepools.py
  • Add 1-GPU and 4-GPU T4 runner defs (4-GPU uses scheduler_name: numa-scheduler)
  • Add cleanup-arc-staging.sh for teardown

g4dn.metal = 2 sockets x 4 T4 (2 NUMA x 4 GPU), topologically identical to
p4d/p5. cpuManagerPolicy=static and Guaranteed-QoS runner pods are already in
place, so CPU+GPU NUMA alignment applies without workload changes.

Cleanup: bash modules/nfd/scripts/cleanup-arc-staging.sh
Then drop this commit: git checkout numa-aware-scheduling (or git reset --hard HEAD~1)

[ghstack-poisoned]
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

Capacity report

commit 808e0dbd · run log

✅ simulate-cluster
Installed 1 package in 1ms
�[1mMonte Carlo Cluster Simulation�[0m
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Seed: 42  |  MAPE threshold: 15%  |  Runners: 45  |  DaemonSets: 17
Peak target runner types: 30 (mapped from 38 old labels)

�[1m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━�[0m
�[1m�[0;36mCluster Simulation Results�[0m
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

�[1;33mSkipped labels (1):�[0m
  �[2ml-arm64g2-6-32: no runner def�[0m

�[1mNodes by instance type:�[0m

  Instance Type          Nodes  vCPU Used vCPU Total   Mem Used  Mem Total   GPU
  ──────────────────────────────────────────────────────────────────────────────
  c7a.48xlarge             261   44794.2c   49871.9c  87800.8Gi  90312.1Gi     -
  c7i.metal-24xl            37    3415.8c    3526.8c   6197.9Gi   6231.7Gi     -
  g4dn.12xlarge            162    7341.8c    7661.0c  27946.6Gi  28109.4Gi 648/648
  g4dn.8xlarge              89    2609.5c    2788.4c  10280.4Gi  10342.3Gi 89/89
  g4dn.metal                87    8205.8c    8279.8c  29972.3Gi  30076.9Gi 696/696
  g5.12xlarge               49    2220.7c    2317.2c   8208.0Gi   8237.5Gi 196/196
  g5.48xlarge               41    7762.1c    7828.1c  28884.9Gi  28910.0Gi 328/328
  g5.8xlarge               603   17680.0c   18892.0c  68446.4Gi  68931.6Gi 603/603
  g6.12xlarge               24    1087.7c    1135.0c   4140.2Gi   4164.4Gi 96/96
  g6.8xlarge               377   11053.6c   11811.4c  42793.2Gi  43096.5Gi 377/377
  m6i.32xlarge              26    3258.3c    3308.2c  12051.3Gi  12077.9Gi     -
  m7g.8xlarge               61     995.5c    1920.3c   3813.1Gi   6992.2Gi     -
  m7g.metal                 30    1869.6c    1902.0c   6795.3Gi   6830.4Gi     -
  m7i.48xlarge              48    8192.0c    9171.8c  32363.6Gi  33658.7Gi     -
  m8g.48xlarge               7    1093.4c    1337.6c   4188.2Gi   4908.6Gi     -
  r7a.48xlarge             137   21506.4c   26178.0c 170772.3Gi 193392.5Gi     -
  r7g.16xlarge             122    7481.0c    7734.8c  56548.2Gi  56673.4Gi     -

�[1mDeployment accuracy:�[0m

  Total deployed: 6208 / 7294 target
  Weighted MAPE: 15.0%

  Runner                              Deployed   Target     Diff
  ───────────────────────────────────────────────────────────────
  �[1;33ml-arm64g3-16-62                           61       76      -15�[0m
  �[1;33ml-arm64g3-61-463                         122      153      -31�[0m
  �[0;32ml-arm64g4-16-62                           67       76       -9�[0m
  �[1;33ml-barm64g3-62-226                         30       39       -9�[0m
  �[1;33ml-bx86iamx-92-167                         37       45       -8�[0m
  �[0;32ml-bx86iavx512-94-344-t4-8                 87       91       -4�[0m
  �[0;32ml-x86aavx2-189-704-a10g-8                 41       42       -1�[0m
  �[0;32ml-x86aavx2-29-113-a10g                   603      695      -92�[0m
  �[0;32ml-x86aavx2-29-113-l4                     377      422      -45�[0m
  �[1;33ml-x86aavx2-45-167-a10g-4                  49       80      -31�[0m
  �[1;33ml-x86aavx2-45-172-l4-4                    24       29       -5�[0m
  �[0;32ml-x86aavx512-125-463                      26       24       +2�[0m
  �[1;33ml-x86iamx-32-128                         130      174      -44�[0m
  �[0;32ml-x86iamx-8-32                           354      384      -30�[0m
  �[1;33ml-x86iavx2-40-160                         22       30       -8�[0m
  �[0;32ml-x86iavx2-8-32                           19       18       +1�[0m
  �[1;33ml-x86iavx512-16-128                       68       89      -21�[0m
  �[1;33ml-x86iavx512-16-32                      1146     1384     -238�[0m
  �[1;33ml-x86iavx512-2-4                          12       15       -3�[0m
  �[0;32ml-x86iavx512-29-115-t4                    89      104      -15�[0m
  �[0;32ml-x86iavx512-32-256                       13       12       +1�[0m
  �[1;33ml-x86iavx512-37-68                        48       65      -17�[0m
  �[0;32ml-x86iavx512-45-172-t4-4                 162      183      -21�[0m
  �[1;33ml-x86iavx512-46-85                       151      189      -38�[0m
  �[0;32ml-x86iavx512-48-384                      366      417      -51�[0m
  �[0;32ml-x86iavx512-8-16                       2054     2400     -346�[0m
  �[0;32ml-x86iavx512-8-64                         26       28       -2�[0m
  �[0;32ml-x86iavx512-94-192                        2        2       +0�[0m
  �[1;33ml-x86iavx512-94-768                       22       28       -6�[0m

�[1mCluster-wide utilization:�[0m

  �[0;32mvCPU:    90.9%�[0m  (150568 / 165664 cores)
  �[0;32mMemory:  95.0%�[0m  (601203 / 632946 GiB)
  �[0;32mGPU:    100.0%�[0m  (3033 / 3033 GPUs across 1432 nodes)

  Total nodes: 2161
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ analyze-utilization
Installed 1 package in 1ms
�[1mNode Utilization Analysis�[0m
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Runner def dirs: /home/runner/work/ci-infra/ci-infra/osdc/modules/arc-runners/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/arc-runners-b200/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/arc-runners-h100/defs
NodePool def dirs: /home/runner/work/ci-infra/ci-infra/osdc/modules/nodepools/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/nodepools-b200/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/nodepools-h100/defs
Utilization threshold: 90.0%

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: c7a.48xlarge�[0m
  Total: 192 vCPU, 384Gi advertised (355.2Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 346.0Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-16-32: 16320m CPU, 32.5Gi RAM (job: 16c+32.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-2-4: 2320m CPU, 4.5Gi RAM (job: 2c+4.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-37-68: 37320m CPU, 68.5Gi RAM (job: 37c+68.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-46-85: 46320m CPU, 85.5Gi RAM (job: 46c+85.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-8-16: 8320m CPU, 16.5Gi RAM (job: 8c+16.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-94-192: 94320m CPU, 189.5Gi RAM (job: 94c+189.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iavx512-16-32�[0m: 10 pods
      CPU:  85.4% (163200m / 191080m) waste: 27880m (27.9 cores)
      MEM:  94.0% (325.1Gi / 346.0Gi) waste: 20.9Gi
      Bottleneck: MEM
    �[0;32ml-x86iavx512-2-4�[0m: 76 pods
      CPU:  92.3% (176320m / 191080m) waste: 14760m (14.8 cores)
      MEM:  99.1% (342.7Gi / 346.0Gi) waste: 3.3Gi
      Bottleneck: MEM
    �[0;32ml-x86iavx512-37-68�[0m: 5 pods
      CPU:  97.7% (186600m / 191080m) waste: 4480m (4.5 cores)
      MEM:  99.0% (342.5Gi / 346.0Gi) waste: 3.5Gi
      Bottleneck: CPU
    �[0;32ml-x86iavx512-46-85�[0m: 4 pods
      CPU:  97.0% (185280m / 191080m) waste: 5800m (5.8 cores)
      MEM:  98.8% (342.0Gi / 346.0Gi) waste: 4.0Gi
      Bottleneck: CPU
    �[1;33ml-x86iavx512-8-16�[0m: 20 pods
      CPU:  87.1% (166400m / 191080m) waste: 24680m (24.7 cores)
      MEM:  95.4% (330.2Gi / 346.0Gi) waste: 15.8Gi
      Bottleneck: MEM
    �[0;31ml-x86iavx512-94-192�[0m: 1 pods
      CPU:  49.4% (94320m / 191080m) waste: 96760m (96.8 cores)
      MEM:  54.8% (189.5Gi / 346.0Gi) waste: 156.5Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 236

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [5xl-x86iavx512-37-68]
         CPU:  97.7%  MEM:  99.0%  waste: 4.5c + 3.5Gi
      �[0;32m#2�[0m [1xl-x86iavx512-2-4, 2xl-x86iavx512-37-68, 2xl-x86iavx512-46-85, 2xl-x86iavx512-8-16]
         CPU:  97.5%  MEM:  99.9%  waste: 4.8c + 466Mi
      �[0;32m#3�[0m [12xl-x86iavx512-2-4, 3xl-x86iavx512-37-68, 1xl-x86iavx512-46-85]
         CPU:  97.4%  MEM:  99.7%  waste: 5.0c + 888Mi
      �[0;32m#4�[0m [1xl-x86iavx512-16-32, 1xl-x86iavx512-2-4, 2xl-x86iavx512-37-68, 2xl-x86iavx512-46-85]
         CPU:  97.3%  MEM:  99.7%  waste: 5.2c + 988Mi
      �[0;32m#5�[0m [8xl-x86iavx512-2-4, 2xl-x86iavx512-37-68, 2xl-x86iavx512-46-85]
         CPU:  97.3%  MEM:  99.4%  waste: 5.2c + 1.9Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iavx512-2-4, 9xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.8%  MEM:  99.0%  waste: 19.6c + 3.4Gi
      �[1;33m#2�[0m [2xl-x86iavx512-16-32, 12xl-x86iavx512-2-4, 2xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.7%  MEM:  98.7%  waste: 19.6c + 4.4Gi
      �[1;33m#3�[0m [4xl-x86iavx512-16-32, 5xl-x86iavx512-2-4, 1xl-x86iavx512-94-192]
         CPU:  89.6%  MEM:  98.9%  waste: 19.9c + 3.9Gi
      �[1;33m#4�[0m [1xl-x86iavx512-16-32, 1xl-x86iavx512-2-4, 7xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.6%  MEM:  98.9%  waste: 19.9c + 3.9Gi
      �[1;33m#5�[0m [2xl-x86iavx512-16-32, 1xl-x86iavx512-2-4, 5xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.4%  MEM:  98.7%  waste: 20.2c + 4.4Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: c7i.12xlarge�[0m
  Total: 48 vCPU, 96Gi advertised (88.8Gi actual)
  Kubelet reserved: 190m CPU, 2.9Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 47440m CPU (47.4 cores), 85.0Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iamx-14-27: 14320m CPU, 27.5Gi RAM (job: 14c+27.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-22-41: 22320m CPU, 41.5Gi RAM (job: 22c+41.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-46-84: 46320m CPU, 84.5Gi RAM (job: 46c+84.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-8-16: 8320m CPU, 16.5Gi RAM (job: 8c+16.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86iamx-14-27�[0m: 3 pods
      CPU:  90.6% (42960m / 47440m) waste: 4480m (4.5 cores)
      MEM:  97.1% (82.5Gi / 85.0Gi) waste: 2.5Gi
      Bottleneck: CPU
    �[0;32ml-x86iamx-22-41�[0m: 2 pods
      CPU:  94.1% (44640m / 47440m) waste: 2800m (2.8 cores)
      MEM:  97.6% (83.0Gi / 85.0Gi) waste: 2.0Gi
      Bottleneck: CPU
    �[0;32ml-x86iamx-46-84�[0m: 1 pods
      CPU:  97.6% (46320m / 47440m) waste: 1120m (1.1 cores)
      MEM:  99.4% (84.5Gi / 85.0Gi) waste: 530Mi
      Bottleneck: CPU
    �[1;33ml-x86iamx-8-16�[0m: 5 pods
      CPU:  87.7% (41600m / 47440m) waste: 5840m (5.8 cores)
      MEM:  97.1% (82.5Gi / 85.0Gi) waste: 2.5Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 8

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iamx-46-84]
         CPU:  97.6%  MEM:  99.4%  waste: 1.1c + 530Mi
      �[0;32m#2�[0m [2xl-x86iamx-22-41]
         CPU:  94.1%  MEM:  97.6%  waste: 2.8c + 2.0Gi
      �[0;32m#3�[0m [3xl-x86iamx-14-27]
         CPU:  90.6%  MEM:  97.1%  waste: 4.5c + 2.5Gi
      �[1;33m#4�[0m [5xl-x86iamx-8-16]
         CPU:  87.7%  MEM:  97.1%  waste: 5.8c + 2.5Gi
      �[1;33m#5�[0m [1xl-x86iamx-14-27, 3xl-x86iamx-8-16]
         CPU:  82.8%  MEM:  90.6%  waste: 8.2c + 8.0Gi

    �[0;31mBottom 3 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iamx-22-41, 2xl-x86iamx-8-16]
         CPU:  82.1%  MEM:  87.7%  waste: 8.5c + 10.5Gi
      �[0;31m#2�[0m [2xl-x86iamx-14-27, 1xl-x86iamx-8-16]
         CPU:  77.9%  MEM:  84.1%  waste: 10.5c + 13.5Gi
      �[0;31m#3�[0m [1xl-x86iamx-14-27, 1xl-x86iamx-22-41]
         CPU:  77.2%  MEM:  81.2%  waste: 10.8c + 16.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: c7i.metal-24xl�[0m
  Total: 96 vCPU, 192Gi advertised (177.6Gi actual)
  Kubelet reserved: 310m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 95320m CPU (95.3 cores), 168.4Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iamx-92-167: 92320m CPU, 167.5Gi RAM (job: 92c+167.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iamx-92-167�[0m: 1 pods
      CPU:  96.9% (92320m / 95320m) waste: 3000m (3.0 cores)
      MEM:  99.5% (167.5Gi / 168.4Gi) waste: 936Mi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-bx86iamx-92-167]
         CPU:  96.9%  MEM:  99.5%  waste: 3.0c + 936Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g4dn.12xlarge�[0m
  Total: 48 vCPU, 192Gi advertised (177.6Gi actual), 4 GPU
  Kubelet reserved: 190m CPU, 2.9Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 47290m CPU (47.3 cores), 173.5Gi RAM, 4 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-45-172-t4-4: 45320m CPU, 172.5Gi RAM, 4 GPU (job: 45c+172.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86iavx512-45-172-t4-4�[0m: 1 pods
      CPU:  95.8% (45320m / 47290m) waste: 1970m (2.0 cores)
      MEM:  99.4% (172.5Gi / 173.5Gi) waste: 1.0Gi
      GPU: 100.0% (4 / 4)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iavx512-45-172-t4-4]
         CPU:  95.8%  MEM:  99.4%  GPU: 100.0%  waste: 2.0c + 1.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g4dn.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual), 1 GPU
  Kubelet reserved: 150m CPU, 993Mi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 31330m CPU (31.3 cores), 116.2Gi RAM, 1 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-29-115-t4: 29320m CPU, 115.5Gi RAM, 1 GPU (job: 29c+115.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86iavx512-29-115-t4�[0m: 1 pods
      CPU:  93.6% (29320m / 31330m) waste: 2010m (2.0 cores)
      MEM:  99.4% (115.5Gi / 116.2Gi) waste: 712Mi
      GPU: 100.0% (1 / 1)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iavx512-29-115-t4]
         CPU:  93.6%  MEM:  99.4%  GPU: 100.0%  waste: 2.0c + 712Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g4dn.metal�[0m
  Total: 96 vCPU, 384Gi advertised (355.2Gi actual), 8 GPU
  Kubelet reserved: 310m CPU, 8.3Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 95170m CPU (95.2 cores), 345.7Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iavx512-11-43-t4-1: 60320m CPU, 43.5Gi RAM, 1 GPU (job: 60c+43.0Gi, hooks: 320m+522Mi)
    - l-bx86iavx512-47-172-t4-4: 47320m CPU, 172.5Gi RAM, 4 GPU (job: 47c+172.0Gi, hooks: 320m+522Mi)
    - l-bx86iavx512-94-344-t4-8: 94320m CPU, 344.5Gi RAM, 8 GPU (job: 94c+344.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;31ml-bx86iavx512-11-43-t4-1�[0m: 1 pods
      CPU:  63.4% (60320m / 95170m) waste: 34850m (34.9 cores)
      MEM:  12.6% (43.5Gi / 345.7Gi) waste: 302.2Gi
      GPU:  12.5% (1 / 8)
      Bottleneck: CPU
    �[0;32ml-bx86iavx512-47-172-t4-4�[0m: 2 pods
      CPU:  99.4% (94640m / 95170m) waste: 530m (0.5 cores)
      MEM:  99.8% (345.0Gi / 345.7Gi) waste: 708Mi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-bx86iavx512-94-344-t4-8�[0m: 1 pods
      CPU:  99.1% (94320m / 95170m) waste: 850m (0.8 cores)
      MEM:  99.7% (344.5Gi / 345.7Gi) waste: 1.2Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 3

    �[0;32mTop 3 most efficient:�[0m
      �[0;32m#1�[0m [2xl-bx86iavx512-47-172-t4-4]
         CPU:  99.4%  MEM:  99.8%  GPU: 100.0%  waste: 0.5c + 708Mi
      �[0;32m#2�[0m [1xl-bx86iavx512-94-344-t4-8]
         CPU:  99.1%  MEM:  99.7%  GPU: 100.0%  waste: 0.8c + 1.2Gi
      �[0;31m#3�[0m [1xl-bx86iavx512-11-43-t4-1]
         CPU:  63.4%  MEM:  12.6%  GPU:  12.5%  waste: 34.9c + 302.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g5.12xlarge�[0m
  Total: 48 vCPU, 192Gi advertised (177.6Gi actual), 4 GPU
  Kubelet reserved: 190m CPU, 8.3Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 47290m CPU (47.3 cores), 168.1Gi RAM, 4 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-45-167-a10g-4: 45320m CPU, 167.5Gi RAM, 4 GPU (job: 45c+167.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-45-167-a10g-4�[0m: 1 pods
      CPU:  95.8% (45320m / 47290m) waste: 1970m (2.0 cores)
      MEM:  99.6% (167.5Gi / 168.1Gi) waste: 616Mi
      GPU: 100.0% (4 / 4)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-45-167-a10g-4]
         CPU:  95.8%  MEM:  99.6%  GPU: 100.0%  waste: 2.0c + 616Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g5.48xlarge�[0m
  Total: 192 vCPU, 768Gi advertised (710.4Gi actual), 8 GPU
  Kubelet reserved: 550m CPU, 4.1Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 190930m CPU (190.9 cores), 705.1Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-189-704-a10g-8: 189320m CPU, 704.5Gi RAM, 8 GPU (job: 189c+704.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-189-704-a10g-8�[0m: 1 pods
      CPU:  99.2% (189320m / 190930m) waste: 1610m (1.6 cores)
      MEM:  99.9% (704.5Gi / 705.1Gi) waste: 627Mi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-189-704-a10g-8]
         CPU:  99.2%  MEM:  99.9%  GPU: 100.0%  waste: 1.6c + 627Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g5.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual), 1 GPU
  Kubelet reserved: 150m CPU, 2.9Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 31330m CPU (31.3 cores), 114.3Gi RAM, 1 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-29-113-a10g: 29320m CPU, 113.5Gi RAM, 1 GPU (job: 29c+113.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-29-113-a10g�[0m: 1 pods
      CPU:  93.6% (29320m / 31330m) waste: 2010m (2.0 cores)
      MEM:  99.3% (113.5Gi / 114.3Gi) waste: 824Mi
      GPU: 100.0% (1 / 1)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-29-113-a10g]
         CPU:  93.6%  MEM:  99.3%  GPU: 100.0%  waste: 2.0c + 824Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g6.12xlarge�[0m
  Total: 48 vCPU, 192Gi advertised (177.6Gi actual), 4 GPU
  Kubelet reserved: 190m CPU, 2.9Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 47290m CPU (47.3 cores), 173.5Gi RAM, 4 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-45-172-l4-4: 45320m CPU, 172.5Gi RAM, 4 GPU (job: 45c+172.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-45-172-l4-4�[0m: 1 pods
      CPU:  95.8% (45320m / 47290m) waste: 1970m (2.0 cores)
      MEM:  99.4% (172.5Gi / 173.5Gi) waste: 1.0Gi
      GPU: 100.0% (4 / 4)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-45-172-l4-4]
         CPU:  95.8%  MEM:  99.4%  GPU: 100.0%  waste: 2.0c + 1.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g6.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual), 1 GPU
  Kubelet reserved: 150m CPU, 2.9Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 31330m CPU (31.3 cores), 114.3Gi RAM, 1 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-29-113-l4: 29320m CPU, 113.5Gi RAM, 1 GPU (job: 29c+113.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-29-113-l4�[0m: 1 pods
      CPU:  93.6% (29320m / 31330m) waste: 2010m (2.0 cores)
      MEM:  99.3% (113.5Gi / 114.3Gi) waste: 824Mi
      GPU: 100.0% (1 / 1)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-29-113-l4]
         CPU:  93.6%  MEM:  99.3%  GPU: 100.0%  waste: 2.0c + 824Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m6i.32xlarge�[0m
  Total: 128 vCPU, 512Gi advertised (473.7Gi actual)
  Kubelet reserved: 390m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 127240m CPU (127.2 cores), 464.5Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx512-125-463: 125320m CPU, 463.5Gi RAM (job: 125c+463.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx512-125-463�[0m: 1 pods
      CPU:  98.5% (125320m / 127240m) waste: 1920m (1.9 cores)
      MEM:  99.8% (463.5Gi / 464.5Gi) waste: 1.0Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx512-125-463]
         CPU:  98.5%  MEM:  99.8%  waste: 1.9c + 1.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m7g.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual)
  Kubelet reserved: 150m CPU, 2.9Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 31480m CPU (31.5 cores), 114.6Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g3-16-62: 16320m CPU, 62.5Gi RAM (job: 16c+62.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;31ml-arm64g3-16-62�[0m: 1 pods
      CPU:  51.8% (16320m / 31480m) waste: 15160m (15.2 cores)
      MEM:  54.5% (62.5Gi / 114.6Gi) waste: 52.1Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;31m#1�[0m [1xl-arm64g3-16-62]
         CPU:  51.8%  MEM:  54.5%  waste: 15.2c + 52.1Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m7g.metal�[0m
  Total: 64 vCPU, 256Gi advertised (236.9Gi actual)
  Kubelet reserved: 230m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 63400m CPU (63.4 cores), 227.7Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-barm64g3-62-226: 62320m CPU, 226.5Gi RAM (job: 62c+226.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-barm64g3-62-226�[0m: 1 pods
      CPU:  98.3% (62320m / 63400m) waste: 1080m (1.1 cores)
      MEM:  99.5% (226.5Gi / 227.7Gi) waste: 1.2Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-barm64g3-62-226]
         CPU:  98.3%  MEM:  99.5%  waste: 1.1c + 1.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m7i.48xlarge�[0m
  Total: 192 vCPU, 768Gi advertised (710.4Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 701.2Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iamx-32-128: 32320m CPU, 128.5Gi RAM (job: 32c+128.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-8-32: 8320m CPU, 32.5Gi RAM (job: 8c+32.0Gi, hooks: 320m+522Mi)
    - l-x86iavx2-40-160: 40320m CPU, 160.5Gi RAM (job: 40c+160.0Gi, hooks: 320m+522Mi)
    - l-x86iavx2-8-32: 8320m CPU, 32.5Gi RAM (job: 8c+32.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iamx-32-128�[0m: 5 pods
      CPU:  84.6% (161600m / 191080m) waste: 29480m (29.5 cores)
      MEM:  91.6% (642.5Gi / 701.2Gi) waste: 58.7Gi
      Bottleneck: CPU
    �[0;32ml-x86iamx-8-32�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  97.4% (682.7Gi / 701.2Gi) waste: 18.5Gi
      Bottleneck: MEM
    �[1;33ml-x86iavx2-40-160�[0m: 4 pods
      CPU:  84.4% (161280m / 191080m) waste: 29800m (29.8 cores)
      MEM:  91.6% (642.0Gi / 701.2Gi) waste: 59.2Gi
      Bottleneck: CPU
    �[0;32ml-x86iavx2-8-32�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  97.4% (682.7Gi / 701.2Gi) waste: 18.5Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 131

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iamx-32-128, 17xl-x86iamx-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#2�[0m [1xl-x86iamx-32-128, 16xl-x86iamx-8-32, 1xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#3�[0m [1xl-x86iamx-32-128, 15xl-x86iamx-8-32, 2xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#4�[0m [1xl-x86iamx-32-128, 14xl-x86iamx-8-32, 3xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#5�[0m [1xl-x86iamx-32-128, 13xl-x86iamx-8-32, 4xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iamx-32-128, 1xl-x86iamx-8-32, 3xl-x86iavx2-40-160, 1xl-x86iavx2-8-32]
         CPU:  88.9%  MEM:  96.3%  waste: 21.2c + 26.2Gi
      �[1;33m#2�[0m [1xl-x86iamx-32-128, 3xl-x86iavx2-40-160, 2xl-x86iavx2-8-32]
         CPU:  88.9%  MEM:  96.3%  waste: 21.2c + 26.2Gi
      �[1;33m#3�[0m [4xl-x86iamx-32-128, 1xl-x86iavx2-40-160]
         CPU:  88.8%  MEM:  96.2%  waste: 21.5c + 26.7Gi
      �[1;33m#4�[0m [1xl-x86iamx-8-32, 4xl-x86iavx2-40-160]
         CPU:  88.8%  MEM:  96.2%  waste: 21.5c + 26.7Gi
      �[1;33m#5�[0m [4xl-x86iavx2-40-160, 1xl-x86iavx2-8-32]
         CPU:  88.8%  MEM:  96.2%  waste: 21.5c + 26.7Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m8g.16xlarge�[0m
  Total: 64 vCPU, 256Gi advertised (236.9Gi actual)
  Kubelet reserved: 230m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 63400m CPU (63.4 cores), 227.7Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-barm64g4-62-226: 62320m CPU, 226.5Gi RAM (job: 62c+226.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-barm64g4-62-226�[0m: 1 pods
      CPU:  98.3% (62320m / 63400m) waste: 1080m (1.1 cores)
      MEM:  99.5% (226.5Gi / 227.7Gi) waste: 1.2Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-barm64g4-62-226]
         CPU:  98.3%  MEM:  99.5%  waste: 1.1c + 1.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m8g.48xlarge�[0m
  Total: 192 vCPU, 768Gi advertised (710.4Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 701.2Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g4-16-62: 16320m CPU, 62.5Gi RAM (job: 16c+62.0Gi, hooks: 320m+522Mi)
    - rel-l-arm64g4-16-62: 16320m CPU, 62.5Gi RAM (job: 16c+62.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-arm64g4-16-62�[0m: 11 pods
      CPU:  94.0% (179520m / 191080m) waste: 11560m (11.6 cores)
      MEM:  98.1% (687.6Gi / 701.2Gi) waste: 13.6Gi
      Bottleneck: CPU
    �[0;32mrel-l-arm64g4-16-62�[0m: 11 pods
      CPU:  94.0% (179520m / 191080m) waste: 11560m (11.6 cores)
      MEM:  98.1% (687.6Gi / 701.2Gi) waste: 13.6Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 12

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [11xl-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#2�[0m [10xl-arm64g4-16-62, 1xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#3�[0m [9xl-arm64g4-16-62, 2xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#4�[0m [8xl-arm64g4-16-62, 3xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#5�[0m [7xl-arm64g4-16-62, 4xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [4xl-arm64g4-16-62, 7xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#2�[0m [3xl-arm64g4-16-62, 8xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#3�[0m [2xl-arm64g4-16-62, 9xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#4�[0m [1xl-arm64g4-16-62, 10xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#5�[0m [11xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: p4d.24xlarge�[0m
  Total: 96 vCPU, 1152Gi advertised (1065.2Gi actual), 8 GPU
  Kubelet reserved: 310m CPU, 3.0Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 95170m CPU (95.2 cores), 1060.9Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iavx512-88-1000-a100-8: 88320m CPU, 1000.5Gi RAM, 8 GPU (job: 88c+1000.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-11-125-a100: 11320m CPU, 125.5Gi RAM, 1 GPU (job: 11c+125.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-22-250-a100-2: 22320m CPU, 250.5Gi RAM, 2 GPU (job: 22c+250.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-44-500-a100-4: 44320m CPU, 500.5Gi RAM, 4 GPU (job: 44c+500.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iavx512-88-1000-a100-8�[0m: 1 pods
      CPU:  92.8% (88320m / 95170m) waste: 6850m (6.8 cores)
      MEM:  94.3% (1000.5Gi / 1060.9Gi) waste: 60.4Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iavx512-11-125-a100�[0m: 8 pods
      CPU:  95.2% (90560m / 95170m) waste: 4610m (4.6 cores)
      MEM:  94.6% (1004.1Gi / 1060.9Gi) waste: 56.9Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iavx512-22-250-a100-2�[0m: 4 pods
      CPU:  93.8% (89280m / 95170m) waste: 5890m (5.9 cores)
      MEM:  94.4% (1002.0Gi / 1060.9Gi) waste: 58.9Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iavx512-44-500-a100-4�[0m: 2 pods
      CPU:  93.1% (88640m / 95170m) waste: 6530m (6.5 cores)
      MEM:  94.4% (1001.0Gi / 1060.9Gi) waste: 59.9Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [8xl-x86iavx512-11-125-a100]
         CPU:  95.2%  MEM:  94.6%  GPU: 100.0%  waste: 4.6c + 56.9Gi
      �[0;32m#2�[0m [6xl-x86iavx512-11-125-a100, 1xl-x86iavx512-22-250-a100-2]
         CPU:  94.8%  MEM:  94.6%  GPU: 100.0%  waste: 4.9c + 57.4Gi
      �[0;32m#3�[0m [4xl-x86iavx512-11-125-a100, 2xl-x86iavx512-22-250-a100-2]
         CPU:  94.5%  MEM:  94.5%  GPU: 100.0%  waste: 5.2c + 57.9Gi
      �[0;32m#4�[0m [4xl-x86iavx512-11-125-a100, 1xl-x86iavx512-44-500-a100-4]
         CPU:  94.1%  MEM:  94.5%  GPU: 100.0%  waste: 5.6c + 58.4Gi
      �[0;32m#5�[0m [2xl-x86iavx512-11-125-a100, 3xl-x86iavx512-22-250-a100-2]
         CPU:  94.1%  MEM:  94.5%  GPU: 100.0%  waste: 5.6c + 58.4Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [2xl-x86iavx512-11-125-a100, 1xl-x86iavx512-22-250-a100-2, 1xl-x86iavx512-44-500-a100-4]
         CPU:  93.8%  MEM:  94.4%  GPU: 100.0%  waste: 5.9c + 58.9Gi
      �[0;32m#2�[0m [4xl-x86iavx512-22-250-a100-2]
         CPU:  93.8%  MEM:  94.4%  GPU: 100.0%  waste: 5.9c + 58.9Gi
      �[0;32m#3�[0m [2xl-x86iavx512-22-250-a100-2, 1xl-x86iavx512-44-500-a100-4]
         CPU:  93.5%  MEM:  94.4%  GPU: 100.0%  waste: 6.2c + 59.4Gi
      �[0;32m#4�[0m [2xl-x86iavx512-44-500-a100-4]
         CPU:  93.1%  MEM:  94.4%  GPU: 100.0%  waste: 6.5c + 59.9Gi
      �[0;32m#5�[0m [1xl-bx86iavx512-88-1000-a100-8]
         CPU:  92.8%  MEM:  94.3%  GPU: 100.0%  waste: 6.8c + 60.4Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: p5.48xlarge�[0m
  Total: 192 vCPU, 2048Gi advertised (1894.4Gi actual), 8 GPU
  Kubelet reserved: 550m CPU, 2.5Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 190930m CPU (190.9 cores), 1890.7Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iamx-176-1800-h100-8: 176320m CPU, 1800.5Gi RAM, 8 GPU (job: 176c+1800.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-22-225-h100: 22320m CPU, 225.5Gi RAM, 1 GPU (job: 22c+225.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-44-450-h100-2: 44320m CPU, 450.5Gi RAM, 2 GPU (job: 44c+450.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-88-900-h100-4: 88320m CPU, 900.5Gi RAM, 4 GPU (job: 88c+900.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iamx-176-1800-h100-8�[0m: 1 pods
      CPU:  92.3% (176320m / 190930m) waste: 14610m (14.6 cores)
      MEM:  95.2% (1800.5Gi / 1890.7Gi) waste: 90.2Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-22-225-h100�[0m: 8 pods
      CPU:  93.5% (178560m / 190930m) waste: 12370m (12.4 cores)
      MEM:  95.4% (1804.1Gi / 1890.7Gi) waste: 86.6Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-44-450-h100-2�[0m: 4 pods
      CPU:  92.9% (177280m / 190930m) waste: 13650m (13.7 cores)
      MEM:  95.3% (1802.0Gi / 1890.7Gi) waste: 88.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-88-900-h100-4�[0m: 2 pods
      CPU:  92.5% (176640m / 190930m) waste: 14290m (14.3 cores)
      MEM:  95.3% (1801.0Gi / 1890.7Gi) waste: 89.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [8xl-x86iamx-22-225-h100]
         CPU:  93.5%  MEM:  95.4%  GPU: 100.0%  waste: 12.4c + 86.6Gi
      �[0;32m#2�[0m [6xl-x86iamx-22-225-h100, 1xl-x86iamx-44-450-h100-2]
         CPU:  93.4%  MEM:  95.4%  GPU: 100.0%  waste: 12.7c + 87.1Gi
      �[0;32m#3�[0m [4xl-x86iamx-22-225-h100, 2xl-x86iamx-44-450-h100-2]
         CPU:  93.2%  MEM:  95.4%  GPU: 100.0%  waste: 13.0c + 87.6Gi
      �[0;32m#4�[0m [4xl-x86iamx-22-225-h100, 1xl-x86iamx-88-900-h100-4]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.3c + 88.2Gi
      �[0;32m#5�[0m [2xl-x86iamx-22-225-h100, 3xl-x86iamx-44-450-h100-2]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.3c + 88.2Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [2xl-x86iamx-22-225-h100, 1xl-x86iamx-44-450-h100-2, 1xl-x86iamx-88-900-h100-4]
         CPU:  92.9%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#2�[0m [4xl-x86iamx-44-450-h100-2]
         CPU:  92.9%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#3�[0m [2xl-x86iamx-44-450-h100-2, 1xl-x86iamx-88-900-h100-4]
         CPU:  92.7%  MEM:  95.3%  GPU: 100.0%  waste: 14.0c + 89.2Gi
      �[0;32m#4�[0m [2xl-x86iamx-88-900-h100-4]
         CPU:  92.5%  MEM:  95.3%  GPU: 100.0%  waste: 14.3c + 89.7Gi
      �[0;32m#5�[0m [1xl-bx86iamx-176-1800-h100-8]
         CPU:  92.3%  MEM:  95.2%  GPU: 100.0%  waste: 14.6c + 90.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: p6-b200.48xlarge�[0m
  Total: 192 vCPU, 2048Gi advertised (1894.4Gi actual), 8 GPU
  Kubelet reserved: 550m CPU, 2.5Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 190930m CPU (190.9 cores), 1890.7Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iamx-176-1800-b200-8: 176320m CPU, 1800.5Gi RAM, 8 GPU (job: 176c+1800.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-22-225-b200: 22320m CPU, 225.5Gi RAM, 1 GPU (job: 22c+225.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-44-450-b200-2: 44320m CPU, 450.5Gi RAM, 2 GPU (job: 44c+450.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-88-900-b200-4: 88320m CPU, 900.5Gi RAM, 4 GPU (job: 88c+900.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iamx-176-1800-b200-8�[0m: 1 pods
      CPU:  92.3% (176320m / 190930m) waste: 14610m (14.6 cores)
      MEM:  95.2% (1800.5Gi / 1890.7Gi) waste: 90.2Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-22-225-b200�[0m: 8 pods
      CPU:  93.5% (178560m / 190930m) waste: 12370m (12.4 cores)
      MEM:  95.4% (1804.1Gi / 1890.7Gi) waste: 86.6Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-44-450-b200-2�[0m: 4 pods
      CPU:  92.9% (177280m / 190930m) waste: 13650m (13.7 cores)
      MEM:  95.3% (1802.0Gi / 1890.7Gi) waste: 88.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-88-900-b200-4�[0m: 2 pods
      CPU:  92.5% (176640m / 190930m) waste: 14290m (14.3 cores)
      MEM:  95.3% (1801.0Gi / 1890.7Gi) waste: 89.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [8xl-x86iamx-22-225-b200]
         CPU:  93.5%  MEM:  95.4%  GPU: 100.0%  waste: 12.4c + 86.6Gi
      �[0;32m#2�[0m [6xl-x86iamx-22-225-b200, 1xl-x86iamx-44-450-b200-2]
         CPU:  93.4%  MEM:  95.4%  GPU: 100.0%  waste: 12.7c + 87.1Gi
      �[0;32m#3�[0m [4xl-x86iamx-22-225-b200, 2xl-x86iamx-44-450-b200-2]
         CPU:  93.2%  MEM:  95.4%  GPU: 100.0%  waste: 13.0c + 87.6Gi
      �[0;32m#4�[0m [4xl-x86iamx-22-225-b200, 1xl-x86iamx-88-900-b200-4]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.3c + 88.2Gi
      �[0;32m#5�[0m [2xl-x86iamx-22-225-b200, 3xl-x86iamx-44-450-b200-2]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.3c + 88.2Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [2xl-x86iamx-22-225-b200, 1xl-x86iamx-44-450-b200-2, 1xl-x86iamx-88-900-b200-4]
         CPU:  92.9%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#2�[0m [4xl-x86iamx-44-450-b200-2]
         CPU:  92.9%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#3�[0m [2xl-x86iamx-44-450-b200-2, 1xl-x86iamx-88-900-b200-4]
         CPU:  92.7%  MEM:  95.3%  GPU: 100.0%  waste: 14.0c + 89.2Gi
      �[0;32m#4�[0m [2xl-x86iamx-88-900-b200-4]
         CPU:  92.5%  MEM:  95.3%  GPU: 100.0%  waste: 14.3c + 89.7Gi
      �[0;32m#5�[0m [1xl-bx86iamx-176-1800-b200-8]
         CPU:  92.3%  MEM:  95.2%  GPU: 100.0%  waste: 14.6c + 90.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: r7a.48xlarge�[0m
  Total: 192 vCPU, 1536Gi advertised (1420.8Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 1411.6Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-16-128: 16320m CPU, 128.5Gi RAM (job: 16c+128.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-32-256: 32320m CPU, 256.5Gi RAM (job: 32c+256.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-48-384: 48320m CPU, 384.5Gi RAM (job: 48c+384.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-8-64: 8320m CPU, 64.5Gi RAM (job: 8c+64.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-94-768: 94320m CPU, 740.5Gi RAM (job: 94c+740.0Gi, hooks: 320m+522Mi)
    - rel-l-x86iavx512-8-64: 8320m CPU, 64.5Gi RAM (job: 8c+64.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iavx512-16-128�[0m: 10 pods
      CPU:  85.4% (163200m / 191080m) waste: 27880m (27.9 cores)
      MEM:  91.0% (1285.1Gi / 1411.6Gi) waste: 126.5Gi
      Bottleneck: MEM
    �[1;33ml-x86iavx512-32-256�[0m: 5 pods
      CPU:  84.6% (161600m / 191080m) waste: 29480m (29.5 cores)
      MEM:  90.9% (1282.5Gi / 1411.6Gi) waste: 129.1Gi
      Bottleneck: CPU
    �[0;31ml-x86iavx512-48-384�[0m: 3 pods
      CPU:  75.9% (144960m / 191080m) waste: 46120m (46.1 cores)
      MEM:  81.7% (1153.5Gi / 1411.6Gi) waste: 258.1Gi
      Bottleneck: CPU
    �[0;32ml-x86iavx512-8-64�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  96.0% (1354.7Gi / 1411.6Gi) waste: 56.9Gi
      Bottleneck: MEM
    �[0;31ml-x86iavx512-94-768�[0m: 1 pods
      CPU:  49.4% (94320m / 191080m) waste: 96760m (96.8 cores)
      MEM:  52.5% (740.5Gi / 1411.6Gi) waste: 671.1Gi
      Bottleneck: MEM
    �[0;32mrel-l-x86iavx512-8-64�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  96.0% (1354.7Gi / 1411.6Gi) waste: 56.9Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 572

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [5xl-x86iavx512-16-128, 2xl-x86iavx512-48-384]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#2�[0m [4xl-x86iavx512-16-128, 2xl-x86iavx512-32-256, 1xl-x86iavx512-48-384]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#3�[0m [3xl-x86iavx512-16-128, 4xl-x86iavx512-32-256]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#4�[0m [2xl-x86iavx512-16-128, 1xl-x86iavx512-32-256, 2xl-x86iavx512-48-384, 2xl-x86iavx512-8-64]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#5�[0m [2xl-x86iavx512-16-128, 1xl-x86iavx512-32-256, 2xl-x86iavx512-48-384, 1xl-x86iavx512-8-64, 1xrel-l-x86iavx512-8-64]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iavx512-16-128, 1xl-x86iavx512-32-256, 2xl-x86iavx512-48-384, 3xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#2�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 3xl-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#3�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 2xl-x86iavx512-8-64, 1xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#4�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 1xl-x86iavx512-8-64, 2xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#5�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 3xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: r7g.16xlarge�[0m
  Total: 64 vCPU, 512Gi advertised (473.7Gi actual)
  Kubelet reserved: 230m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 63400m CPU (63.4 cores), 464.5Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g3-61-463: 61320m CPU, 463.5Gi RAM (job: 61c+463.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-arm64g3-61-463�[0m: 1 pods
      CPU:  96.7% (61320m / 63400m) waste: 2080m (2.1 cores)
      MEM:  99.8% (463.5Gi / 464.5Gi) waste: 1.0Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-arm64g3-61-463]
         CPU:  96.7%  MEM:  99.8%  waste: 2.1c + 1.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: r7i.48xlarge�[0m
  Total: 192 vCPU, 1536Gi advertised (1420.8Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 1411.6Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iamx-16-128: 16320m CPU, 128.5Gi RAM (job: 16c+128.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-8-64: 8320m CPU, 64.5Gi RAM (job: 8c+64.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iamx-16-128�[0m: 10 pods
      CPU:  85.4% (163200m / 191080m) waste: 27880m (27.9 cores)
      MEM:  91.0% (1285.1Gi / 1411.6Gi) waste: 126.5Gi
      Bottleneck: MEM
    �[0;32ml-x86iamx-8-64�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  96.0% (1354.7Gi / 1411.6Gi) waste: 56.9Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iamx-16-128, 19xl-x86iamx-8-64]
         CPU:  91.3%  MEM:  95.9%  waste: 16.7c + 57.4Gi
      �[0;32m#2�[0m [2xl-x86iamx-16-128, 17xl-x86iamx-8-64]
         CPU:  91.1%  MEM:  95.9%  waste: 17.0c + 57.9Gi
      �[0;32m#3�[0m [3xl-x86iamx-16-128, 15xl-x86iamx-8-64]
         CPU:  90.9%  MEM:  95.9%  waste: 17.3c + 58.4Gi
      �[0;32m#4�[0m [4xl-x86iamx-16-128, 13xl-x86iamx-8-64]
         CPU:  90.8%  MEM:  95.8%  waste: 17.6c + 59.0Gi
      �[0;32m#5�[0m [5xl-x86iamx-16-128, 11xl-x86iamx-8-64]
         CPU:  90.6%  MEM:  95.8%  waste: 18.0c + 59.5Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [6xl-x86iamx-16-128, 9xl-x86iamx-8-64]
         CPU:  90.4%  MEM:  95.8%  waste: 18.3c + 60.0Gi
      �[0;32m#2�[0m [7xl-x86iamx-16-128, 7xl-x86iamx-8-64]
         CPU:  90.3%  MEM:  95.7%  waste: 18.6c + 60.5Gi
      �[0;32m#3�[0m [8xl-x86iamx-16-128, 5xl-x86iamx-8-64]
         CPU:  90.1%  MEM:  95.7%  waste: 18.9c + 61.0Gi
      �[1;33m#4�[0m [9xl-x86iamx-16-128, 3xl-x86iamx-8-64]
         CPU:  89.9%  MEM:  95.6%  waste: 19.2c + 61.5Gi
      �[1;33m#5�[0m [10xl-x86iamx-16-128, 1xl-x86iamx-8-64]
         CPU:  89.8%  MEM:  95.6%  waste: 19.6c + 62.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: t4g.2xlarge�[0m
  Total: 8 vCPU, 32Gi advertised (29.6Gi actual)
  Kubelet reserved: 90m CPU, 993Mi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 7540m CPU (7.5 cores), 27.7Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g2-6-25: 6320m CPU, 25.5Gi RAM (job: 6c+25.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-arm64g2-6-25�[0m: 1 pods
      CPU:  83.8% (6320m / 7540m) waste: 1220m (1.2 cores)
      MEM:  92.0% (25.5Gi / 27.7Gi) waste: 2.2Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[1;33m#1�[0m [1xl-arm64g2-6-25]
         CPU:  83.8%  MEM:  92.0%  waste: 1.2c + 2.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[0;31m�[1mFound 14 runner type(s) with homogeneous utilization below 90.0%�[0m

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1mUnused resource headroom per node (homogeneous packing only):�[0m

  Node Type                 Min CPU    Max CPU    Min MEM    Max MEM
  ────────────────────────────────────────────────────────────────
  c7a.48xlarge              4480m     96760m      3.3Gi    156.5Gi
  c7i.12xlarge              1120m      5840m      530Mi      2.5Gi
  c7i.metal-24xl            3000m      3000m      936Mi      936Mi
  g4dn.12xlarge             1970m      1970m      1.0Gi      1.0Gi
  g4dn.8xlarge              2010m      2010m      712Mi      712Mi
  g4dn.metal                 530m     34850m      708Mi    302.2Gi
  g5.12xlarge               1970m      1970m      616Mi      616Mi
  g5.48xlarge               1610m      1610m      627Mi      627Mi
  g5.8xlarge                2010m      2010m      824Mi      824Mi
  g6.12xlarge               1970m      1970m      1.0Gi      1.0Gi
  g6.8xlarge                2010m      2010m      824Mi      824Mi
  m6i.32xlarge              1920m      1920m      1.0Gi      1.0Gi
  m7g.8xlarge              15160m     15160m     52.1Gi     52.1Gi
  m7g.metal                 1080m      1080m      1.2Gi      1.2Gi
  m7i.48xlarge             16360m     29800m     18.5Gi     59.2Gi
  m8g.16xlarge              1080m      1080m      1.2Gi      1.2Gi
  m8g.48xlarge             11560m     11560m     13.6Gi     13.6Gi
  p4d.24xlarge              4610m      6850m     56.9Gi     60.4Gi
  p5.48xlarge              12370m     14610m     86.6Gi     90.2Gi
  p6-b200.48xlarge         12370m     14610m     86.6Gi     90.2Gi
  r7a.48xlarge             16360m     96760m     56.9Gi    671.1Gi
  r7g.16xlarge              2080m      2080m      1.0Gi      1.0Gi
  r7i.48xlarge             16360m     27880m     56.9Gi    126.5Gi
  t4g.2xlarge               1220m      1220m      2.2Gi      2.2Gi
  ────────────────────────────────────────────────────────────────
  �[1mWORST CASE            �[0m     530m     96760m      530Mi    671.1Gi

  The tightest node has only �[1m530m CPU�[0m and �[1m530Mi RAM�[0m free.
  Any new DaemonSet must fit within these limits or runners will fail to schedule.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

tofu plan — arc-cbr-production

✅ Plan succeeded · commit 808e0dbd · run log

Plan output
Installed 1 package in 1ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


Acquiring state lock. This may take a few moments...
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=527854a4-e335-4f95-bc89-1321cff7a478]
data.aws_availability_zones.available: Reading...
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3]
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role]
module.eks.data.aws_caller_identity.current: Reading...
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0e712dc7e743bbcf7]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-node-role]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-2]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNOLQFN6MU]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-eks-secrets]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-node-role:pytorch-arc-cbr-production-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-032d4401e63f0c9b9]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-05e96ee7cb818e5c0]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-0d34063a19f4b07b4]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ab11fcdb8d4ea113]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0d26e280575e8aaf4]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-0a583bbbcac436ebd]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0709abbcafa23aec0]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-01187bfaa68514400]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-01e479dcb5aedf696]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-0577a02acde719bff]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0992f582e9bf2836e]
module.vpc.aws_eip.nat_secondary["us-east-2c-2"]: Refreshing state... [id=eipalloc-07cfdb2fd5dc07459]
module.vpc.aws_eip.nat_secondary["us-east-2c-6"]: Refreshing state... [id=eipalloc-0aede78edc69cf695]
module.vpc.aws_eip.nat_secondary["us-east-2a-1"]: Refreshing state... [id=eipalloc-0f2b00a9ac31df215]
module.vpc.aws_eip.nat_secondary["us-east-2a-3"]: Refreshing state... [id=eipalloc-034d5e1f5a2fcb795]
module.vpc.aws_eip.nat_secondary["us-east-2b-4"]: Refreshing state... [id=eipalloc-0de33181548ac2e5a]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-009f1fe7d56695348]
module.vpc.aws_eip.nat_secondary["us-east-2b-3"]: Refreshing state... [id=eipalloc-021ee6c9f1d20b71a]
module.vpc.aws_eip.nat_secondary["us-east-2c-1"]: Refreshing state... [id=eipalloc-06a980076e99cda81]
module.vpc.aws_eip.nat_secondary["us-east-2c-0"]: Refreshing state... [id=eipalloc-03542e74755fc105b]
module.vpc.aws_eip.nat_secondary["us-east-2c-4"]: Refreshing state... [id=eipalloc-0cc3dadec18bbb3f3]
module.vpc.aws_eip.nat_secondary["us-east-2b-0"]: Refreshing state... [id=eipalloc-0cead990d60ce181e]
module.vpc.aws_eip.nat_secondary["us-east-2c-5"]: Refreshing state... [id=eipalloc-02825435a2786b3d8]
module.vpc.aws_eip.nat_secondary["us-east-2b-5"]: Refreshing state... [id=eipalloc-0cde9a6463901f1e1]
module.vpc.aws_eip.nat_secondary["us-east-2b-6"]: Refreshing state... [id=eipalloc-06b7b88826199a232]
module.vpc.aws_eip.nat_secondary["us-east-2a-6"]: Refreshing state... [id=eipalloc-0113c95dbdec2f879]
module.vpc.aws_eip.nat_secondary["us-east-2a-2"]: Refreshing state... [id=eipalloc-09b15a770e0c6d552]
module.vpc.aws_eip.nat_secondary["us-east-2a-5"]: Refreshing state... [id=eipalloc-0bd9bf54bd6010323]
module.vpc.aws_eip.nat_secondary["us-east-2b-1"]: Refreshing state... [id=eipalloc-0e67c0a8cd8c990da]
module.vpc.aws_eip.nat_secondary["us-east-2b-2"]: Refreshing state... [id=eipalloc-063bee447616351f9]
module.vpc.aws_eip.nat_secondary["us-east-2a-0"]: Refreshing state... [id=eipalloc-086a011b3c26c0dd7]
module.vpc.aws_eip.nat_secondary["us-east-2a-4"]: Refreshing state... [id=eipalloc-067d535102a61d1a8]
module.vpc.aws_eip.nat_secondary["us-east-2c-3"]: Refreshing state... [id=eipalloc-0d3a71569b2f687be]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0fddf2f74e7e978c7]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-084975a7f7af2696e]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-0ce4fba002d90e7d5]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-07d5cd4c479c827ab]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-08e264cbbd47be1ee]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-0f7b8f4473e5790df]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0ad75b2f5282877db]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production:kube-proxy]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production:vpc-cni]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-0c7ecd4166a01e5f0]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-0cb3785c433ed7718]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01d38d41a7ca82a08]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-0b820cd15307b6d57]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=033a163afb2babc26f7883e642621ac361c93d61]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-2.amazonaws.com/id/0A621339248958D6D5F2FF084BD185B5]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0beb143017359bda1]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-0b6e08b4b0dc968c0]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-097abe4676c74f71b]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production:pytorch-arc-cbr-production-base-nodes]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=2879363015]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance]
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change-KarpenterInstanceStateChange]
data.terraform_remote_state.base: Read complete after 2s
aws_ec2_tag.subnet_karpenter_discovery["subnet-0992f582e9bf2836e"]: Refreshing state... [id=subnet-0992f582e9bf2836e,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0709abbcafa23aec0"]: Refreshing state... [id=subnet-0709abbcafa23aec0,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0577a02acde719bff"]: Refreshing state... [id=subnet-0577a02acde719bff,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-karpenter-controller]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller]
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-01ec5f742ae028981,karpenter.sh/discovery]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller-20260518021844404100000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (arc-cbr-production) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-pypi-wants-collector-s3]
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-pypi-wheel-syncer-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-0deb818bbf18764de]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wheel-syncer-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-efs-csi-driver-role]
aws_iam_role.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wants-collector-role]
aws_security_group.efs: Refreshing state... [id=sg-0979eb5e3d9d3db9f]
aws_efs_mount_target.pypi_cache["subnet-0709abbcafa23aec0"]: Refreshing state... [id=fsmt-08cd5108febbacef9]
aws_efs_mount_target.pypi_cache["subnet-0992f582e9bf2836e"]: Refreshing state... [id=fsmt-03523586bb4ff0c46]
aws_efs_mount_target.pypi_cache["subnet-0577a02acde719bff"]: Refreshing state... [id=fsmt-07d7b111b9cd6684e]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wheel-syncer-role-20260518023249929400000004]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-efs-csi-driver-role-20260518023249955700000005]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wants-collector-role-20260518023249903900000003]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-efs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

tofu plan — arc-cbr-production-uw1

✅ Plan succeeded · commit 808e0dbd · run log

Plan output
Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod-uw1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production-uw1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.data.aws_caller_identity.current: Reading...
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0121d1038d393182a]
data.aws_availability_zones.available: Reading...
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role]
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role]
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3]
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=1fb5d763-c5cd-4de5-bf40-712df992288c]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNFWBLKNFS]
data.aws_availability_zones.available: Read complete after 1s [id=us-west-1]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-uw1-eks-secrets]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role:pytorch-arc-cbr-production-uw1-node-cni-ipv6]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-07fd8394a1d58b614]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-0b3b22b995e71d8d9]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-07b06397ce403fa53]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0a13e7b49c841e497]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0a8410ffa0f0014a7]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0bd275a35f8e7ef65]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-08861bee27120b994]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-05f5edbf2c6678c03]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ce35bb011df0cfdb]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-06d137da3460167c4]
module.vpc.aws_eip.nat_secondary["us-west-1c-5"]: Refreshing state... [id=eipalloc-0635efedc10ee5f66]
module.vpc.aws_eip.nat_secondary["us-west-1a-0"]: Refreshing state... [id=eipalloc-0e3ca79e34012a238]
module.vpc.aws_eip.nat_secondary["us-west-1a-6"]: Refreshing state... [id=eipalloc-08763a35db0a26caa]
module.vpc.aws_eip.nat_secondary["us-west-1a-4"]: Refreshing state... [id=eipalloc-0dfae88698dce850e]
module.vpc.aws_eip.nat_secondary["us-west-1c-3"]: Refreshing state... [id=eipalloc-09f89978685e7f3c7]
module.vpc.aws_eip.nat_secondary["us-west-1a-5"]: Refreshing state... [id=eipalloc-059986f686b188dc2]
module.vpc.aws_eip.nat_secondary["us-west-1c-2"]: Refreshing state... [id=eipalloc-0f2e15b6a36b52fac]
module.vpc.aws_eip.nat_secondary["us-west-1c-6"]: Refreshing state... [id=eipalloc-0cf91a032d10f4ec5]
module.vpc.aws_eip.nat_secondary["us-west-1c-4"]: Refreshing state... [id=eipalloc-0dfaa16c61333ceb3]
module.vpc.aws_eip.nat_secondary["us-west-1a-3"]: Refreshing state... [id=eipalloc-05a2bad636af56f4d]
module.vpc.aws_eip.nat_secondary["us-west-1a-2"]: Refreshing state... [id=eipalloc-0647e169131be5893]
module.vpc.aws_eip.nat_secondary["us-west-1c-0"]: Refreshing state... [id=eipalloc-0d565f5bf077b05cf]
module.vpc.aws_eip.nat_secondary["us-west-1c-1"]: Refreshing state... [id=eipalloc-0bd09c7f2dcaa0a46]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3-20260519191031756900000001]
module.vpc.aws_eip.nat_secondary["us-west-1a-1"]: Refreshing state... [id=eipalloc-012ac413772344fea]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-00184fa8d73e575c9]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0f79a2ac72857a304]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production-uw1]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0c336634317cc9f35]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-01ec520e3931f5f6a]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01165f36472c0a780]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-06e17b37b87d890f2]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production-uw1:kube-proxy]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production-uw1:vpc-cni]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-066ae5f473a2b07c0]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0cc835aef3e3bcc21]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-02e4c54e5fa3b4f8a]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production-uw1:pytorch-arc-cbr-production-uw1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 1s [id=ab5db6c82031e2d229412c67921160a3b3af073b]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-west-1.amazonaws.com/id/ED52EC64FF5CFAB4151C6E4B5DE279BD]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=3969145930]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production-uw1:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption-KarpenterSpotInterruption]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-058909cc1cdc63fad,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0a13e7b49c841e497"]: Refreshing state... [id=subnet-0a13e7b49c841e497,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-08861bee27120b994"]: Refreshing state... [id=subnet-08861bee27120b994,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller-20260519195229107000000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-s3]
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wants-collector-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-0da5eaf2022d80aa0]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role]
aws_iam_role.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role]
aws_iam_role.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role]
aws_security_group.efs: Refreshing state... [id=sg-01c1f3fa51705db76]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role-20260519200350826400000005]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role-20260519200350777100000003]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role-20260519200350781900000004]
aws_efs_mount_target.pypi_cache["subnet-08861bee27120b994"]: Refreshing state... [id=fsmt-00708cc923d4d2055]
aws_efs_mount_target.pypi_cache["subnet-0a13e7b49c841e497"]: Refreshing state... [id=fsmt-089fd42858a5a85ab]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-efs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

tofu plan — meta-prod-aws-ue1

✅ Plan succeeded · commit 808e0dbd · run log

Plan output
Installed 1 package in 1ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod-ue1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (meta-prod-aws-ue1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=meta-prod-aws-ue1-harbor-s3]
module.eks.aws_iam_role.cluster: Refreshing state... [id=meta-prod-aws-ue1-cluster-role]
module.eks.aws_iam_role.node: Refreshing state... [id=meta-prod-aws-ue1-node-role]
module.eks.data.aws_caller_identity.current: Reading...
data.aws_availability_zones.available: Reading...
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=9274017b-776a-41bd-9f11-d118a1174159]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-046818728dce02486]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/meta-prod-aws-ue1-eks-secrets]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNGRUDTXPT]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=meta-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=meta-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=meta-prod-aws-ue1-node-role:meta-prod-aws-ue1-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=meta-prod-aws-ue1-harbor-s3/arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-0dafeb02304897431]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-0cf3d9cf37ee998b6]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-0ce44cb6446f3c1b6]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-02ce11d6646870431]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0beb5fc44f0ee165f]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0348c5058db524cd2]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0d65ec2dd49f0d87c]
module.vpc.aws_eip.nat_secondary["us-east-1a-0"]: Refreshing state... [id=eipalloc-0c8a6faed0a97479d]
module.vpc.aws_eip.nat_secondary["us-east-1c-0"]: Refreshing state... [id=eipalloc-05844040c7248f44f]
module.vpc.aws_eip.nat_secondary["us-east-1c-2"]: Refreshing state... [id=eipalloc-025ef0e1813277c67]
module.vpc.aws_eip.nat_secondary["us-east-1b-4"]: Refreshing state... [id=eipalloc-0aba12aa23c11d20c]
module.vpc.aws_eip.nat_secondary["us-east-1a-5"]: Refreshing state... [id=eipalloc-01f89a7c130d2a810]
module.vpc.aws_eip.nat_secondary["us-east-1c-4"]: Refreshing state... [id=eipalloc-00c5df9f3b60f353d]
module.vpc.aws_eip.nat_secondary["us-east-1c-1"]: Refreshing state... [id=eipalloc-0cb5208c5f775baf6]
module.vpc.aws_eip.nat_secondary["us-east-1a-6"]: Refreshing state... [id=eipalloc-02e84a51a14c9cbda]
module.vpc.aws_eip.nat_secondary["us-east-1a-3"]: Refreshing state... [id=eipalloc-0bda13d7b70c00c00]
module.vpc.aws_eip.nat_secondary["us-east-1b-0"]: Refreshing state... [id=eipalloc-0bcfe1f98793e1b12]
module.vpc.aws_eip.nat_secondary["us-east-1a-1"]: Refreshing state... [id=eipalloc-08c7bd3306cf687ca]
module.vpc.aws_eip.nat_secondary["us-east-1b-6"]: Refreshing state... [id=eipalloc-0f922f499d32f1368]
module.vpc.aws_eip.nat_secondary["us-east-1b-1"]: Refreshing state... [id=eipalloc-0d095305019486ae6]
module.vpc.aws_eip.nat_secondary["us-east-1b-2"]: Refreshing state... [id=eipalloc-0f0b720f4cca62ec7]
module.vpc.aws_eip.nat_secondary["us-east-1a-4"]: Refreshing state... [id=eipalloc-09fa171393c3a7cfb]
module.vpc.aws_eip.nat_secondary["us-east-1b-3"]: Refreshing state... [id=eipalloc-0c8291ee817240e1f]
module.vpc.aws_eip.nat_secondary["us-east-1c-3"]: Refreshing state... [id=eipalloc-0af54aa2e5f40dfa4]
module.vpc.aws_eip.nat_secondary["us-east-1a-2"]: Refreshing state... [id=eipalloc-080ec4e265ebdc5ad]
module.vpc.aws_eip.nat_secondary["us-east-1b-5"]: Refreshing state... [id=eipalloc-0d078dc6f07628714]
module.vpc.aws_eip.nat_secondary["us-east-1c-5"]: Refreshing state... [id=eipalloc-04fe645562f597aaa]
module.vpc.aws_eip.nat_secondary["us-east-1c-6"]: Refreshing state... [id=eipalloc-0d22d3aa0667a1070]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0eafd792589fbb363]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-00c2e2605c4dea199]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-033772b4490df1b41]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-078f44b58c8b48ade]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-07bfd0f170c3b3406]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0f922406e02ecba1d]
module.eks.aws_eks_cluster.this: Refreshing state... [id=meta-prod-aws-ue1]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-05e7e66e960593972]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-05da47c4ed26ae390]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0616491b7baeab47f]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=meta-prod-aws-ue1:kube-proxy]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=meta-prod-aws-ue1:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=meta-prod-aws-ue1:vpc-cni]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-043779597e3b5a7fd]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-09414719983019b49]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-025de56c0aac8d3f0]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0cff785d8001fc914]
module.eks.aws_eks_node_group.base: Refreshing state... [id=meta-prod-aws-ue1:meta-prod-aws-ue1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 1s [id=b1b539daa206035ae3c3e28288b0681fa1b462f3]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/6C84A48E1BF23A027C1E78912A368743]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-05d5b7a41aa6323ed]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-0c665948be8d0282e]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-09287d705ce4a88bc]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=3022997555]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=meta-prod-aws-ue1-ebs-csi-driver-role]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-02a8683fa7258f295]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-09dca398d838d4247]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-0306281246323bd27]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=meta-prod-aws-ue1:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=meta-prod-aws-ue1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1:aws-ebs-csi-driver]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry/arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=meta-prod-aws-ue1#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (meta-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-instance-state-change]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=meta-prod-aws-ue1-karpenter-rebalance]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=meta-prod-aws-ue1-karpenter-spot-interruption]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/308535385114/meta-prod-aws-ue1-karpenter]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-scheduled-change]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/308535385114/meta-prod-aws-ue1-karpenter]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=meta-prod-aws-ue1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=meta-prod-aws-ue1-karpenter-spot-interruption-KarpenterSpotInterruption]
data.terraform_remote_state.base: Read complete after 2s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-016f4a0d209f3e4a9,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0348c5058db524cd2"]: Refreshing state... [id=subnet-0348c5058db524cd2,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0d65ec2dd49f0d87c"]: Refreshing state... [id=subnet-0d65ec2dd49f0d87c,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=meta-prod-aws-ue1-karpenter-controller]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-02ce11d6646870431"]: Refreshing state... [id=subnet-02ce11d6646870431,karpenter.sh/discovery]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=meta-prod-aws-ue1-karpenter-controller-20260528200455768400000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (meta-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-pypi-wants-collector-s3]
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-pypi-wheel-syncer-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-023e57b36ec1cd426]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.wants_collector: Refreshing state... [id=meta-prod-aws-ue1-pypi-wants-collector-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1-efs-csi-driver-role]
aws_security_group.efs: Refreshing state... [id=sg-0bc06caa62214c9b7]
aws_iam_role.wheel_syncer: Refreshing state... [id=meta-prod-aws-ue1-pypi-wheel-syncer-role]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=meta-prod-aws-ue1-pypi-wants-collector-role-20260528201106192600000004]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=meta-prod-aws-ue1-pypi-wheel-syncer-role-20260528201106257700000005]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1-efs-csi-driver-role-20260528201106116400000003]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1:aws-efs-csi-driver]
aws_efs_mount_target.pypi_cache["subnet-02ce11d6646870431"]: Refreshing state... [id=fsmt-06a05c001541338d2]
aws_efs_mount_target.pypi_cache["subnet-0348c5058db524cd2"]: Refreshing state... [id=fsmt-0500c573cafe66133]
aws_efs_mount_target.pypi_cache["subnet-0d65ec2dd49f0d87c"]: Refreshing state... [id=fsmt-0ffaedc58eceb7749]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

[ghstack-poisoned]
georgehong added a commit that referenced this pull request Jun 15, 2026
Temporary test configuration — DO NOT MERGE. Parallel to the A100 [TEST-ONLY]
commit: the same NUMA pipeline, validated on g4dn.metal (T4) because AWS does
not offer A100/p4d in us-west-1 (arc-staging's region), whereas g4dn.metal is
available on-demand there.

- Add nfd + numa-scheduler to arc-staging modules; remove from prod clusters
- Point NFD topology-updater + taint-remover nodeSelector at g4dn-metal-numa
- Gate the nfd-topology startup taint on the g4dn-metal-numa fleet
- Add g4dn-metal-numa nodepool: single-numa-node, capped to ONE node
  (limits.nvidia.com/gpu: 8) so 1-GPU + 4-GPU runners pack one 2-NUMA box
- Add a nodepool `limits` passthrough to generate_nodepools.py
- Add 1-GPU and 4-GPU T4 runner defs (4-GPU uses scheduler_name: numa-scheduler)
- Add cleanup-arc-staging.sh for teardown

g4dn.metal = 2 sockets x 4 T4 (2 NUMA x 4 GPU), topologically identical to
p4d/p5. cpuManagerPolicy=static and Guaranteed-QoS runner pods are already in
place, so CPU+GPU NUMA alignment applies without workload changes.

Cleanup: bash modules/nfd/scripts/cleanup-arc-staging.sh
Then drop this commit: git checkout numa-aware-scheduling (or git reset --hard HEAD~1)

ghstack-source-id: a9e35b6
Pull-Request: #748
[ghstack-poisoned]
georgehong added a commit that referenced this pull request Jun 15, 2026
Temporary test configuration — DO NOT MERGE. Parallel to the A100 [TEST-ONLY]
commit: the same NUMA pipeline, validated on g4dn.metal (T4) because AWS does
not offer A100/p4d in us-west-1 (arc-staging's region), whereas g4dn.metal is
available on-demand there.

- Add nfd + numa-scheduler to arc-staging modules; remove from prod clusters
- Point NFD topology-updater + taint-remover nodeSelector at g4dn-metal-numa
- Gate the nfd-topology startup taint on the g4dn-metal-numa fleet
- Add g4dn-metal-numa nodepool: single-numa-node, capped to ONE node
  (limits.nvidia.com/gpu: 8) so 1-GPU + 4-GPU runners pack one 2-NUMA box
- Add a nodepool `limits` passthrough to generate_nodepools.py
- Add 1-GPU and 4-GPU T4 runner defs (4-GPU uses scheduler_name: numa-scheduler)
- Add cleanup-arc-staging.sh for teardown

g4dn.metal = 2 sockets x 4 T4 (2 NUMA x 4 GPU), topologically identical to
p4d/p5. cpuManagerPolicy=static and Guaranteed-QoS runner pods are already in
place, so CPU+GPU NUMA alignment applies without workload changes.

Cleanup: bash modules/nfd/scripts/cleanup-arc-staging.sh
Then drop this commit: git checkout numa-aware-scheduling (or git reset --hard HEAD~1)

ghstack-source-id: 7af6767
Pull-Request: #748
[ghstack-poisoned]
georgehong added a commit that referenced this pull request Jun 15, 2026
Temporary test configuration — DO NOT MERGE. Parallel to the A100 [TEST-ONLY]
commit: the same NUMA pipeline, validated on g4dn.metal (T4) because AWS does
not offer A100/p4d in us-west-1 (arc-staging's region), whereas g4dn.metal is
available on-demand there.

- Add nfd + numa-scheduler to arc-staging modules; remove from prod clusters
- Point NFD topology-updater + taint-remover nodeSelector at g4dn-metal-numa
- Gate the nfd-topology startup taint on the g4dn-metal-numa fleet
- Add g4dn-metal-numa nodepool: single-numa-node, capped to ONE node
  (limits.nvidia.com/gpu: 8) so 1-GPU + 4-GPU runners pack one 2-NUMA box
- Add a nodepool `limits` passthrough to generate_nodepools.py
- Add 1-GPU and 4-GPU T4 runner defs (4-GPU uses scheduler_name: numa-scheduler)
- Add cleanup-arc-staging.sh for teardown

g4dn.metal = 2 sockets x 4 T4 (2 NUMA x 4 GPU), topologically identical to
p4d/p5. cpuManagerPolicy=static and Guaranteed-QoS runner pods are already in
place, so CPU+GPU NUMA alignment applies without workload changes.

Cleanup: bash modules/nfd/scripts/cleanup-arc-staging.sh
Then drop this commit: git checkout numa-aware-scheduling (or git reset --hard HEAD~1)

ghstack-source-id: 7fa073e
Pull-Request: #748
[ghstack-poisoned]
georgehong added a commit that referenced this pull request Jun 15, 2026
Temporary test configuration — DO NOT MERGE. Parallel to the A100 [TEST-ONLY]
commit: the same NUMA pipeline, validated on g4dn.metal (T4) because AWS does
not offer A100/p4d in us-west-1 (arc-staging's region), whereas g4dn.metal is
available on-demand there.

- Add nfd + numa-scheduler to arc-staging modules; remove from prod clusters
- Point NFD topology-updater + taint-remover nodeSelector at g4dn-metal-numa
- Gate the nfd-topology startup taint on the g4dn-metal-numa fleet
- Add g4dn-metal-numa nodepool: single-numa-node, capped to ONE node
  (limits.nvidia.com/gpu: 8) so 1-GPU + 4-GPU runners pack one 2-NUMA box
- Add a nodepool `limits` passthrough to generate_nodepools.py
- Add 1-GPU and 4-GPU T4 runner defs (4-GPU uses scheduler_name: numa-scheduler)
- Add cleanup-arc-staging.sh for teardown

g4dn.metal = 2 sockets x 4 T4 (2 NUMA x 4 GPU), topologically identical to
p4d/p5. cpuManagerPolicy=static and Guaranteed-QoS runner pods are already in
place, so CPU+GPU NUMA alignment applies without workload changes.

Cleanup: bash modules/nfd/scripts/cleanup-arc-staging.sh
Then drop this commit: git checkout numa-aware-scheduling (or git reset --hard HEAD~1)

ghstack-source-id: c63197b
Pull-Request: #748
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant