[TEST-ONLY] Enable NUMA modules on arc-staging with g4dn.metal (T4) by georgehong · Pull Request #748 · pytorch/ci-infra

georgehong · 2026-06-12T21:09:12Z

Stack from ghstack (oldest at bottom):

Temporary test configuration — DO NOT MERGE. Parallel to the A100 [TEST-ONLY]
commit: the same NUMA pipeline, validated on g4dn.metal (T4) because AWS does
not offer A100/p4d in us-west-1 (arc-staging's region), whereas g4dn.metal is
available on-demand there.

Add nfd + numa-scheduler to arc-staging modules; remove from prod clusters
Point NFD topology-updater + taint-remover nodeSelector at g4dn-metal-numa
Gate the nfd-topology startup taint on the g4dn-metal-numa fleet
Add g4dn-metal-numa nodepool: single-numa-node, capped to ONE node
(limits.nvidia.com/gpu: 8) so 1-GPU + 4-GPU runners pack one 2-NUMA box
Add a nodepool limits passthrough to generate_nodepools.py
Add 1-GPU and 4-GPU T4 runner defs (4-GPU uses scheduler_name: numa-scheduler)
Add cleanup-arc-staging.sh for teardown

g4dn.metal = 2 sockets x 4 T4 (2 NUMA x 4 GPU), topologically identical to
p4d/p5. cpuManagerPolicy=static and Guaranteed-QoS runner pods are already in
place, so CPU+GPU NUMA alignment applies without workload changes.

Cleanup: bash modules/nfd/scripts/cleanup-arc-staging.sh
Then drop this commit: git checkout numa-aware-scheduling (or git reset --hard HEAD~1)

[ghstack-poisoned]

github-actions · 2026-06-12T21:09:53Z

Capacity report

commit 808e0dbd · run log

✅ simulate-cluster

Installed 1 package in 1ms
�[1mMonte Carlo Cluster Simulation�[0m
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Seed: 42  |  MAPE threshold: 15%  |  Runners: 45  |  DaemonSets: 17
Peak target runner types: 30 (mapped from 38 old labels)

�[1m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━�[0m
�[1m�[0;36mCluster Simulation Results�[0m
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

�[1;33mSkipped labels (1):�[0m
  �[2ml-arm64g2-6-32: no runner def�[0m

�[1mNodes by instance type:�[0m

  Instance Type          Nodes  vCPU Used vCPU Total   Mem Used  Mem Total   GPU
  ──────────────────────────────────────────────────────────────────────────────
  c7a.48xlarge             261   44794.2c   49871.9c  87800.8Gi  90312.1Gi     -
  c7i.metal-24xl            37    3415.8c    3526.8c   6197.9Gi   6231.7Gi     -
  g4dn.12xlarge            162    7341.8c    7661.0c  27946.6Gi  28109.4Gi 648/648
  g4dn.8xlarge              89    2609.5c    2788.4c  10280.4Gi  10342.3Gi 89/89
  g4dn.metal                87    8205.8c    8279.8c  29972.3Gi  30076.9Gi 696/696
  g5.12xlarge               49    2220.7c    2317.2c   8208.0Gi   8237.5Gi 196/196
  g5.48xlarge               41    7762.1c    7828.1c  28884.9Gi  28910.0Gi 328/328
  g5.8xlarge               603   17680.0c   18892.0c  68446.4Gi  68931.6Gi 603/603
  g6.12xlarge               24    1087.7c    1135.0c   4140.2Gi   4164.4Gi 96/96
  g6.8xlarge               377   11053.6c   11811.4c  42793.2Gi  43096.5Gi 377/377
  m6i.32xlarge              26    3258.3c    3308.2c  12051.3Gi  12077.9Gi     -
  m7g.8xlarge               61     995.5c    1920.3c   3813.1Gi   6992.2Gi     -
  m7g.metal                 30    1869.6c    1902.0c   6795.3Gi   6830.4Gi     -
  m7i.48xlarge              48    8192.0c    9171.8c  32363.6Gi  33658.7Gi     -
  m8g.48xlarge               7    1093.4c    1337.6c   4188.2Gi   4908.6Gi     -
  r7a.48xlarge             137   21506.4c   26178.0c 170772.3Gi 193392.5Gi     -
  r7g.16xlarge             122    7481.0c    7734.8c  56548.2Gi  56673.4Gi     -

�[1mDeployment accuracy:�[0m

  Total deployed: 6208 / 7294 target
  Weighted MAPE: 15.0%

  Runner                              Deployed   Target     Diff
  ───────────────────────────────────────────────────────────────
  �[1;33ml-arm64g3-16-62                           61       76      -15�[0m
  �[1;33ml-arm64g3-61-463                         122      153      -31�[0m
  �[0;32ml-arm64g4-16-62                           67       76       -9�[0m
  �[1;33ml-barm64g3-62-226                         30       39       -9�[0m
  �[1;33ml-bx86iamx-92-167                         37       45       -8�[0m
  �[0;32ml-bx86iavx512-94-344-t4-8                 87       91       -4�[0m
  �[0;32ml-x86aavx2-189-704-a10g-8                 41       42       -1�[0m
  �[0;32ml-x86aavx2-29-113-a10g                   603      695      -92�[0m
  �[0;32ml-x86aavx2-29-113-l4                     377      422      -45�[0m
  �[1;33ml-x86aavx2-45-167-a10g-4                  49       80      -31�[0m
  �[1;33ml-x86aavx2-45-172-l4-4                    24       29       -5�[0m
  �[0;32ml-x86aavx512-125-463                      26       24       +2�[0m
  �[1;33ml-x86iamx-32-128                         130      174      -44�[0m
  �[0;32ml-x86iamx-8-32                           354      384      -30�[0m
  �[1;33ml-x86iavx2-40-160                         22       30       -8�[0m
  �[0;32ml-x86iavx2-8-32                           19       18       +1�[0m
  �[1;33ml-x86iavx512-16-128                       68       89      -21�[0m
  �[1;33ml-x86iavx512-16-32                      1146     1384     -238�[0m
  �[1;33ml-x86iavx512-2-4                          12       15       -3�[0m
  �[0;32ml-x86iavx512-29-115-t4                    89      104      -15�[0m
  �[0;32ml-x86iavx512-32-256                       13       12       +1�[0m
  �[1;33ml-x86iavx512-37-68                        48       65      -17�[0m
  �[0;32ml-x86iavx512-45-172-t4-4                 162      183      -21�[0m
  �[1;33ml-x86iavx512-46-85                       151      189      -38�[0m
  �[0;32ml-x86iavx512-48-384                      366      417      -51�[0m
  �[0;32ml-x86iavx512-8-16                       2054     2400     -346�[0m
  �[0;32ml-x86iavx512-8-64                         26       28       -2�[0m
  �[0;32ml-x86iavx512-94-192                        2        2       +0�[0m
  �[1;33ml-x86iavx512-94-768                       22       28       -6�[0m

�[1mCluster-wide utilization:�[0m

  �[0;32mvCPU:    90.9%�[0m  (150568 / 165664 cores)
  �[0;32mMemory:  95.0%�[0m  (601203 / 632946 GiB)
  �[0;32mGPU:    100.0%�[0m  (3033 / 3033 GPUs across 1432 nodes)

  Total nodes: 2161
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✅ analyze-utilization

Installed 1 package in 1ms
�[1mNode Utilization Analysis�[0m
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Runner def dirs: /home/runner/work/ci-infra/ci-infra/osdc/modules/arc-runners/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/arc-runners-b200/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/arc-runners-h100/defs
NodePool def dirs: /home/runner/work/ci-infra/ci-infra/osdc/modules/nodepools/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/nodepools-b200/defs, /home/runner/work/ci-infra/ci-infra/osdc/modules/nodepools-h100/defs
Utilization threshold: 90.0%

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: c7a.48xlarge�[0m
  Total: 192 vCPU, 384Gi advertised (355.2Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 346.0Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-16-32: 16320m CPU, 32.5Gi RAM (job: 16c+32.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-2-4: 2320m CPU, 4.5Gi RAM (job: 2c+4.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-37-68: 37320m CPU, 68.5Gi RAM (job: 37c+68.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-46-85: 46320m CPU, 85.5Gi RAM (job: 46c+85.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-8-16: 8320m CPU, 16.5Gi RAM (job: 8c+16.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-94-192: 94320m CPU, 189.5Gi RAM (job: 94c+189.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iavx512-16-32�[0m: 10 pods
      CPU:  85.4% (163200m / 191080m) waste: 27880m (27.9 cores)
      MEM:  94.0% (325.1Gi / 346.0Gi) waste: 20.9Gi
      Bottleneck: MEM
    �[0;32ml-x86iavx512-2-4�[0m: 76 pods
      CPU:  92.3% (176320m / 191080m) waste: 14760m (14.8 cores)
      MEM:  99.1% (342.7Gi / 346.0Gi) waste: 3.3Gi
      Bottleneck: MEM
    �[0;32ml-x86iavx512-37-68�[0m: 5 pods
      CPU:  97.7% (186600m / 191080m) waste: 4480m (4.5 cores)
      MEM:  99.0% (342.5Gi / 346.0Gi) waste: 3.5Gi
      Bottleneck: CPU
    �[0;32ml-x86iavx512-46-85�[0m: 4 pods
      CPU:  97.0% (185280m / 191080m) waste: 5800m (5.8 cores)
      MEM:  98.8% (342.0Gi / 346.0Gi) waste: 4.0Gi
      Bottleneck: CPU
    �[1;33ml-x86iavx512-8-16�[0m: 20 pods
      CPU:  87.1% (166400m / 191080m) waste: 24680m (24.7 cores)
      MEM:  95.4% (330.2Gi / 346.0Gi) waste: 15.8Gi
      Bottleneck: MEM
    �[0;31ml-x86iavx512-94-192�[0m: 1 pods
      CPU:  49.4% (94320m / 191080m) waste: 96760m (96.8 cores)
      MEM:  54.8% (189.5Gi / 346.0Gi) waste: 156.5Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 236

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [5xl-x86iavx512-37-68]
         CPU:  97.7%  MEM:  99.0%  waste: 4.5c + 3.5Gi
      �[0;32m#2�[0m [1xl-x86iavx512-2-4, 2xl-x86iavx512-37-68, 2xl-x86iavx512-46-85, 2xl-x86iavx512-8-16]
         CPU:  97.5%  MEM:  99.9%  waste: 4.8c + 466Mi
      �[0;32m#3�[0m [12xl-x86iavx512-2-4, 3xl-x86iavx512-37-68, 1xl-x86iavx512-46-85]
         CPU:  97.4%  MEM:  99.7%  waste: 5.0c + 888Mi
      �[0;32m#4�[0m [1xl-x86iavx512-16-32, 1xl-x86iavx512-2-4, 2xl-x86iavx512-37-68, 2xl-x86iavx512-46-85]
         CPU:  97.3%  MEM:  99.7%  waste: 5.2c + 988Mi
      �[0;32m#5�[0m [8xl-x86iavx512-2-4, 2xl-x86iavx512-37-68, 2xl-x86iavx512-46-85]
         CPU:  97.3%  MEM:  99.4%  waste: 5.2c + 1.9Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iavx512-2-4, 9xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.8%  MEM:  99.0%  waste: 19.6c + 3.4Gi
      �[1;33m#2�[0m [2xl-x86iavx512-16-32, 12xl-x86iavx512-2-4, 2xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.7%  MEM:  98.7%  waste: 19.6c + 4.4Gi
      �[1;33m#3�[0m [4xl-x86iavx512-16-32, 5xl-x86iavx512-2-4, 1xl-x86iavx512-94-192]
         CPU:  89.6%  MEM:  98.9%  waste: 19.9c + 3.9Gi
      �[1;33m#4�[0m [1xl-x86iavx512-16-32, 1xl-x86iavx512-2-4, 7xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.6%  MEM:  98.9%  waste: 19.9c + 3.9Gi
      �[1;33m#5�[0m [2xl-x86iavx512-16-32, 1xl-x86iavx512-2-4, 5xl-x86iavx512-8-16, 1xl-x86iavx512-94-192]
         CPU:  89.4%  MEM:  98.7%  waste: 20.2c + 4.4Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: c7i.12xlarge�[0m
  Total: 48 vCPU, 96Gi advertised (88.8Gi actual)
  Kubelet reserved: 190m CPU, 2.9Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 47440m CPU (47.4 cores), 85.0Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iamx-14-27: 14320m CPU, 27.5Gi RAM (job: 14c+27.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-22-41: 22320m CPU, 41.5Gi RAM (job: 22c+41.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-46-84: 46320m CPU, 84.5Gi RAM (job: 46c+84.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-8-16: 8320m CPU, 16.5Gi RAM (job: 8c+16.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86iamx-14-27�[0m: 3 pods
      CPU:  90.6% (42960m / 47440m) waste: 4480m (4.5 cores)
      MEM:  97.1% (82.5Gi / 85.0Gi) waste: 2.5Gi
      Bottleneck: CPU
    �[0;32ml-x86iamx-22-41�[0m: 2 pods
      CPU:  94.1% (44640m / 47440m) waste: 2800m (2.8 cores)
      MEM:  97.6% (83.0Gi / 85.0Gi) waste: 2.0Gi
      Bottleneck: CPU
    �[0;32ml-x86iamx-46-84�[0m: 1 pods
      CPU:  97.6% (46320m / 47440m) waste: 1120m (1.1 cores)
      MEM:  99.4% (84.5Gi / 85.0Gi) waste: 530Mi
      Bottleneck: CPU
    �[1;33ml-x86iamx-8-16�[0m: 5 pods
      CPU:  87.7% (41600m / 47440m) waste: 5840m (5.8 cores)
      MEM:  97.1% (82.5Gi / 85.0Gi) waste: 2.5Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 8

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iamx-46-84]
         CPU:  97.6%  MEM:  99.4%  waste: 1.1c + 530Mi
      �[0;32m#2�[0m [2xl-x86iamx-22-41]
         CPU:  94.1%  MEM:  97.6%  waste: 2.8c + 2.0Gi
      �[0;32m#3�[0m [3xl-x86iamx-14-27]
         CPU:  90.6%  MEM:  97.1%  waste: 4.5c + 2.5Gi
      �[1;33m#4�[0m [5xl-x86iamx-8-16]
         CPU:  87.7%  MEM:  97.1%  waste: 5.8c + 2.5Gi
      �[1;33m#5�[0m [1xl-x86iamx-14-27, 3xl-x86iamx-8-16]
         CPU:  82.8%  MEM:  90.6%  waste: 8.2c + 8.0Gi

    �[0;31mBottom 3 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iamx-22-41, 2xl-x86iamx-8-16]
         CPU:  82.1%  MEM:  87.7%  waste: 8.5c + 10.5Gi
      �[0;31m#2�[0m [2xl-x86iamx-14-27, 1xl-x86iamx-8-16]
         CPU:  77.9%  MEM:  84.1%  waste: 10.5c + 13.5Gi
      �[0;31m#3�[0m [1xl-x86iamx-14-27, 1xl-x86iamx-22-41]
         CPU:  77.2%  MEM:  81.2%  waste: 10.8c + 16.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: c7i.metal-24xl�[0m
  Total: 96 vCPU, 192Gi advertised (177.6Gi actual)
  Kubelet reserved: 310m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 95320m CPU (95.3 cores), 168.4Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iamx-92-167: 92320m CPU, 167.5Gi RAM (job: 92c+167.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iamx-92-167�[0m: 1 pods
      CPU:  96.9% (92320m / 95320m) waste: 3000m (3.0 cores)
      MEM:  99.5% (167.5Gi / 168.4Gi) waste: 936Mi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-bx86iamx-92-167]
         CPU:  96.9%  MEM:  99.5%  waste: 3.0c + 936Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g4dn.12xlarge�[0m
  Total: 48 vCPU, 192Gi advertised (177.6Gi actual), 4 GPU
  Kubelet reserved: 190m CPU, 2.9Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 47290m CPU (47.3 cores), 173.5Gi RAM, 4 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-45-172-t4-4: 45320m CPU, 172.5Gi RAM, 4 GPU (job: 45c+172.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86iavx512-45-172-t4-4�[0m: 1 pods
      CPU:  95.8% (45320m / 47290m) waste: 1970m (2.0 cores)
      MEM:  99.4% (172.5Gi / 173.5Gi) waste: 1.0Gi
      GPU: 100.0% (4 / 4)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iavx512-45-172-t4-4]
         CPU:  95.8%  MEM:  99.4%  GPU: 100.0%  waste: 2.0c + 1.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g4dn.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual), 1 GPU
  Kubelet reserved: 150m CPU, 993Mi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 31330m CPU (31.3 cores), 116.2Gi RAM, 1 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-29-115-t4: 29320m CPU, 115.5Gi RAM, 1 GPU (job: 29c+115.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86iavx512-29-115-t4�[0m: 1 pods
      CPU:  93.6% (29320m / 31330m) waste: 2010m (2.0 cores)
      MEM:  99.4% (115.5Gi / 116.2Gi) waste: 712Mi
      GPU: 100.0% (1 / 1)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iavx512-29-115-t4]
         CPU:  93.6%  MEM:  99.4%  GPU: 100.0%  waste: 2.0c + 712Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g4dn.metal�[0m
  Total: 96 vCPU, 384Gi advertised (355.2Gi actual), 8 GPU
  Kubelet reserved: 310m CPU, 8.3Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 95170m CPU (95.2 cores), 345.7Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iavx512-11-43-t4-1: 60320m CPU, 43.5Gi RAM, 1 GPU (job: 60c+43.0Gi, hooks: 320m+522Mi)
    - l-bx86iavx512-47-172-t4-4: 47320m CPU, 172.5Gi RAM, 4 GPU (job: 47c+172.0Gi, hooks: 320m+522Mi)
    - l-bx86iavx512-94-344-t4-8: 94320m CPU, 344.5Gi RAM, 8 GPU (job: 94c+344.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;31ml-bx86iavx512-11-43-t4-1�[0m: 1 pods
      CPU:  63.4% (60320m / 95170m) waste: 34850m (34.9 cores)
      MEM:  12.6% (43.5Gi / 345.7Gi) waste: 302.2Gi
      GPU:  12.5% (1 / 8)
      Bottleneck: CPU
    �[0;32ml-bx86iavx512-47-172-t4-4�[0m: 2 pods
      CPU:  99.4% (94640m / 95170m) waste: 530m (0.5 cores)
      MEM:  99.8% (345.0Gi / 345.7Gi) waste: 708Mi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-bx86iavx512-94-344-t4-8�[0m: 1 pods
      CPU:  99.1% (94320m / 95170m) waste: 850m (0.8 cores)
      MEM:  99.7% (344.5Gi / 345.7Gi) waste: 1.2Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 3

    �[0;32mTop 3 most efficient:�[0m
      �[0;32m#1�[0m [2xl-bx86iavx512-47-172-t4-4]
         CPU:  99.4%  MEM:  99.8%  GPU: 100.0%  waste: 0.5c + 708Mi
      �[0;32m#2�[0m [1xl-bx86iavx512-94-344-t4-8]
         CPU:  99.1%  MEM:  99.7%  GPU: 100.0%  waste: 0.8c + 1.2Gi
      �[0;31m#3�[0m [1xl-bx86iavx512-11-43-t4-1]
         CPU:  63.4%  MEM:  12.6%  GPU:  12.5%  waste: 34.9c + 302.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g5.12xlarge�[0m
  Total: 48 vCPU, 192Gi advertised (177.6Gi actual), 4 GPU
  Kubelet reserved: 190m CPU, 8.3Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 47290m CPU (47.3 cores), 168.1Gi RAM, 4 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-45-167-a10g-4: 45320m CPU, 167.5Gi RAM, 4 GPU (job: 45c+167.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-45-167-a10g-4�[0m: 1 pods
      CPU:  95.8% (45320m / 47290m) waste: 1970m (2.0 cores)
      MEM:  99.6% (167.5Gi / 168.1Gi) waste: 616Mi
      GPU: 100.0% (4 / 4)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-45-167-a10g-4]
         CPU:  95.8%  MEM:  99.6%  GPU: 100.0%  waste: 2.0c + 616Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g5.48xlarge�[0m
  Total: 192 vCPU, 768Gi advertised (710.4Gi actual), 8 GPU
  Kubelet reserved: 550m CPU, 4.1Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 190930m CPU (190.9 cores), 705.1Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-189-704-a10g-8: 189320m CPU, 704.5Gi RAM, 8 GPU (job: 189c+704.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-189-704-a10g-8�[0m: 1 pods
      CPU:  99.2% (189320m / 190930m) waste: 1610m (1.6 cores)
      MEM:  99.9% (704.5Gi / 705.1Gi) waste: 627Mi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-189-704-a10g-8]
         CPU:  99.2%  MEM:  99.9%  GPU: 100.0%  waste: 1.6c + 627Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g5.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual), 1 GPU
  Kubelet reserved: 150m CPU, 2.9Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 31330m CPU (31.3 cores), 114.3Gi RAM, 1 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-29-113-a10g: 29320m CPU, 113.5Gi RAM, 1 GPU (job: 29c+113.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-29-113-a10g�[0m: 1 pods
      CPU:  93.6% (29320m / 31330m) waste: 2010m (2.0 cores)
      MEM:  99.3% (113.5Gi / 114.3Gi) waste: 824Mi
      GPU: 100.0% (1 / 1)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-29-113-a10g]
         CPU:  93.6%  MEM:  99.3%  GPU: 100.0%  waste: 2.0c + 824Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g6.12xlarge�[0m
  Total: 48 vCPU, 192Gi advertised (177.6Gi actual), 4 GPU
  Kubelet reserved: 190m CPU, 2.9Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 47290m CPU (47.3 cores), 173.5Gi RAM, 4 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-45-172-l4-4: 45320m CPU, 172.5Gi RAM, 4 GPU (job: 45c+172.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-45-172-l4-4�[0m: 1 pods
      CPU:  95.8% (45320m / 47290m) waste: 1970m (2.0 cores)
      MEM:  99.4% (172.5Gi / 173.5Gi) waste: 1.0Gi
      GPU: 100.0% (4 / 4)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-45-172-l4-4]
         CPU:  95.8%  MEM:  99.4%  GPU: 100.0%  waste: 2.0c + 1.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: g6.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual), 1 GPU
  Kubelet reserved: 150m CPU, 2.9Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 31330m CPU (31.3 cores), 114.3Gi RAM, 1 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx2-29-113-l4: 29320m CPU, 113.5Gi RAM, 1 GPU (job: 29c+113.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx2-29-113-l4�[0m: 1 pods
      CPU:  93.6% (29320m / 31330m) waste: 2010m (2.0 cores)
      MEM:  99.3% (113.5Gi / 114.3Gi) waste: 824Mi
      GPU: 100.0% (1 / 1)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx2-29-113-l4]
         CPU:  93.6%  MEM:  99.3%  GPU: 100.0%  waste: 2.0c + 824Mi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m6i.32xlarge�[0m
  Total: 128 vCPU, 512Gi advertised (473.7Gi actual)
  Kubelet reserved: 390m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 127240m CPU (127.2 cores), 464.5Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86aavx512-125-463: 125320m CPU, 463.5Gi RAM (job: 125c+463.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-x86aavx512-125-463�[0m: 1 pods
      CPU:  98.5% (125320m / 127240m) waste: 1920m (1.9 cores)
      MEM:  99.8% (463.5Gi / 464.5Gi) waste: 1.0Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86aavx512-125-463]
         CPU:  98.5%  MEM:  99.8%  waste: 1.9c + 1.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m7g.8xlarge�[0m
  Total: 32 vCPU, 128Gi advertised (118.4Gi actual)
  Kubelet reserved: 150m CPU, 2.9Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 31480m CPU (31.5 cores), 114.6Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g3-16-62: 16320m CPU, 62.5Gi RAM (job: 16c+62.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;31ml-arm64g3-16-62�[0m: 1 pods
      CPU:  51.8% (16320m / 31480m) waste: 15160m (15.2 cores)
      MEM:  54.5% (62.5Gi / 114.6Gi) waste: 52.1Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;31m#1�[0m [1xl-arm64g3-16-62]
         CPU:  51.8%  MEM:  54.5%  waste: 15.2c + 52.1Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m7g.metal�[0m
  Total: 64 vCPU, 256Gi advertised (236.9Gi actual)
  Kubelet reserved: 230m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 63400m CPU (63.4 cores), 227.7Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-barm64g3-62-226: 62320m CPU, 226.5Gi RAM (job: 62c+226.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-barm64g3-62-226�[0m: 1 pods
      CPU:  98.3% (62320m / 63400m) waste: 1080m (1.1 cores)
      MEM:  99.5% (226.5Gi / 227.7Gi) waste: 1.2Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-barm64g3-62-226]
         CPU:  98.3%  MEM:  99.5%  waste: 1.1c + 1.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m7i.48xlarge�[0m
  Total: 192 vCPU, 768Gi advertised (710.4Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 701.2Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iamx-32-128: 32320m CPU, 128.5Gi RAM (job: 32c+128.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-8-32: 8320m CPU, 32.5Gi RAM (job: 8c+32.0Gi, hooks: 320m+522Mi)
    - l-x86iavx2-40-160: 40320m CPU, 160.5Gi RAM (job: 40c+160.0Gi, hooks: 320m+522Mi)
    - l-x86iavx2-8-32: 8320m CPU, 32.5Gi RAM (job: 8c+32.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iamx-32-128�[0m: 5 pods
      CPU:  84.6% (161600m / 191080m) waste: 29480m (29.5 cores)
      MEM:  91.6% (642.5Gi / 701.2Gi) waste: 58.7Gi
      Bottleneck: CPU
    �[0;32ml-x86iamx-8-32�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  97.4% (682.7Gi / 701.2Gi) waste: 18.5Gi
      Bottleneck: MEM
    �[1;33ml-x86iavx2-40-160�[0m: 4 pods
      CPU:  84.4% (161280m / 191080m) waste: 29800m (29.8 cores)
      MEM:  91.6% (642.0Gi / 701.2Gi) waste: 59.2Gi
      Bottleneck: CPU
    �[0;32ml-x86iavx2-8-32�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  97.4% (682.7Gi / 701.2Gi) waste: 18.5Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 131

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iamx-32-128, 17xl-x86iamx-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#2�[0m [1xl-x86iamx-32-128, 16xl-x86iamx-8-32, 1xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#3�[0m [1xl-x86iamx-32-128, 15xl-x86iamx-8-32, 2xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#4�[0m [1xl-x86iamx-32-128, 14xl-x86iamx-8-32, 3xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi
      �[0;32m#5�[0m [1xl-x86iamx-32-128, 13xl-x86iamx-8-32, 4xl-x86iavx2-8-32]
         CPU:  90.9%  MEM:  97.1%  waste: 17.3c + 20.0Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iamx-32-128, 1xl-x86iamx-8-32, 3xl-x86iavx2-40-160, 1xl-x86iavx2-8-32]
         CPU:  88.9%  MEM:  96.3%  waste: 21.2c + 26.2Gi
      �[1;33m#2�[0m [1xl-x86iamx-32-128, 3xl-x86iavx2-40-160, 2xl-x86iavx2-8-32]
         CPU:  88.9%  MEM:  96.3%  waste: 21.2c + 26.2Gi
      �[1;33m#3�[0m [4xl-x86iamx-32-128, 1xl-x86iavx2-40-160]
         CPU:  88.8%  MEM:  96.2%  waste: 21.5c + 26.7Gi
      �[1;33m#4�[0m [1xl-x86iamx-8-32, 4xl-x86iavx2-40-160]
         CPU:  88.8%  MEM:  96.2%  waste: 21.5c + 26.7Gi
      �[1;33m#5�[0m [4xl-x86iavx2-40-160, 1xl-x86iavx2-8-32]
         CPU:  88.8%  MEM:  96.2%  waste: 21.5c + 26.7Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m8g.16xlarge�[0m
  Total: 64 vCPU, 256Gi advertised (236.9Gi actual)
  Kubelet reserved: 230m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 63400m CPU (63.4 cores), 227.7Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-barm64g4-62-226: 62320m CPU, 226.5Gi RAM (job: 62c+226.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-barm64g4-62-226�[0m: 1 pods
      CPU:  98.3% (62320m / 63400m) waste: 1080m (1.1 cores)
      MEM:  99.5% (226.5Gi / 227.7Gi) waste: 1.2Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-barm64g4-62-226]
         CPU:  98.3%  MEM:  99.5%  waste: 1.1c + 1.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: m8g.48xlarge�[0m
  Total: 192 vCPU, 768Gi advertised (710.4Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 701.2Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g4-16-62: 16320m CPU, 62.5Gi RAM (job: 16c+62.0Gi, hooks: 320m+522Mi)
    - rel-l-arm64g4-16-62: 16320m CPU, 62.5Gi RAM (job: 16c+62.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-arm64g4-16-62�[0m: 11 pods
      CPU:  94.0% (179520m / 191080m) waste: 11560m (11.6 cores)
      MEM:  98.1% (687.6Gi / 701.2Gi) waste: 13.6Gi
      Bottleneck: CPU
    �[0;32mrel-l-arm64g4-16-62�[0m: 11 pods
      CPU:  94.0% (179520m / 191080m) waste: 11560m (11.6 cores)
      MEM:  98.1% (687.6Gi / 701.2Gi) waste: 13.6Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 12

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [11xl-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#2�[0m [10xl-arm64g4-16-62, 1xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#3�[0m [9xl-arm64g4-16-62, 2xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#4�[0m [8xl-arm64g4-16-62, 3xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#5�[0m [7xl-arm64g4-16-62, 4xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [4xl-arm64g4-16-62, 7xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#2�[0m [3xl-arm64g4-16-62, 8xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#3�[0m [2xl-arm64g4-16-62, 9xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#4�[0m [1xl-arm64g4-16-62, 10xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi
      �[0;32m#5�[0m [11xrel-l-arm64g4-16-62]
         CPU:  94.0%  MEM:  98.1%  waste: 11.6c + 13.6Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: p4d.24xlarge�[0m
  Total: 96 vCPU, 1152Gi advertised (1065.2Gi actual), 8 GPU
  Kubelet reserved: 310m CPU, 3.0Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 95170m CPU (95.2 cores), 1060.9Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iavx512-88-1000-a100-8: 88320m CPU, 1000.5Gi RAM, 8 GPU (job: 88c+1000.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-11-125-a100: 11320m CPU, 125.5Gi RAM, 1 GPU (job: 11c+125.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-22-250-a100-2: 22320m CPU, 250.5Gi RAM, 2 GPU (job: 22c+250.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-44-500-a100-4: 44320m CPU, 500.5Gi RAM, 4 GPU (job: 44c+500.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iavx512-88-1000-a100-8�[0m: 1 pods
      CPU:  92.8% (88320m / 95170m) waste: 6850m (6.8 cores)
      MEM:  94.3% (1000.5Gi / 1060.9Gi) waste: 60.4Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iavx512-11-125-a100�[0m: 8 pods
      CPU:  95.2% (90560m / 95170m) waste: 4610m (4.6 cores)
      MEM:  94.6% (1004.1Gi / 1060.9Gi) waste: 56.9Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iavx512-22-250-a100-2�[0m: 4 pods
      CPU:  93.8% (89280m / 95170m) waste: 5890m (5.9 cores)
      MEM:  94.4% (1002.0Gi / 1060.9Gi) waste: 58.9Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iavx512-44-500-a100-4�[0m: 2 pods
      CPU:  93.1% (88640m / 95170m) waste: 6530m (6.5 cores)
      MEM:  94.4% (1001.0Gi / 1060.9Gi) waste: 59.9Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [8xl-x86iavx512-11-125-a100]
         CPU:  95.2%  MEM:  94.6%  GPU: 100.0%  waste: 4.6c + 56.9Gi
      �[0;32m#2�[0m [6xl-x86iavx512-11-125-a100, 1xl-x86iavx512-22-250-a100-2]
         CPU:  94.8%  MEM:  94.6%  GPU: 100.0%  waste: 4.9c + 57.4Gi
      �[0;32m#3�[0m [4xl-x86iavx512-11-125-a100, 2xl-x86iavx512-22-250-a100-2]
         CPU:  94.5%  MEM:  94.5%  GPU: 100.0%  waste: 5.2c + 57.9Gi
      �[0;32m#4�[0m [4xl-x86iavx512-11-125-a100, 1xl-x86iavx512-44-500-a100-4]
         CPU:  94.1%  MEM:  94.5%  GPU: 100.0%  waste: 5.6c + 58.4Gi
      �[0;32m#5�[0m [2xl-x86iavx512-11-125-a100, 3xl-x86iavx512-22-250-a100-2]
         CPU:  94.1%  MEM:  94.5%  GPU: 100.0%  waste: 5.6c + 58.4Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [2xl-x86iavx512-11-125-a100, 1xl-x86iavx512-22-250-a100-2, 1xl-x86iavx512-44-500-a100-4]
         CPU:  93.8%  MEM:  94.4%  GPU: 100.0%  waste: 5.9c + 58.9Gi
      �[0;32m#2�[0m [4xl-x86iavx512-22-250-a100-2]
         CPU:  93.8%  MEM:  94.4%  GPU: 100.0%  waste: 5.9c + 58.9Gi
      �[0;32m#3�[0m [2xl-x86iavx512-22-250-a100-2, 1xl-x86iavx512-44-500-a100-4]
         CPU:  93.5%  MEM:  94.4%  GPU: 100.0%  waste: 6.2c + 59.4Gi
      �[0;32m#4�[0m [2xl-x86iavx512-44-500-a100-4]
         CPU:  93.1%  MEM:  94.4%  GPU: 100.0%  waste: 6.5c + 59.9Gi
      �[0;32m#5�[0m [1xl-bx86iavx512-88-1000-a100-8]
         CPU:  92.8%  MEM:  94.3%  GPU: 100.0%  waste: 6.8c + 60.4Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: p5.48xlarge�[0m
  Total: 192 vCPU, 2048Gi advertised (1894.4Gi actual), 8 GPU
  Kubelet reserved: 550m CPU, 2.5Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 190930m CPU (190.9 cores), 1890.7Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iamx-176-1800-h100-8: 176320m CPU, 1800.5Gi RAM, 8 GPU (job: 176c+1800.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-22-225-h100: 22320m CPU, 225.5Gi RAM, 1 GPU (job: 22c+225.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-44-450-h100-2: 44320m CPU, 450.5Gi RAM, 2 GPU (job: 44c+450.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-88-900-h100-4: 88320m CPU, 900.5Gi RAM, 4 GPU (job: 88c+900.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iamx-176-1800-h100-8�[0m: 1 pods
      CPU:  92.3% (176320m / 190930m) waste: 14610m (14.6 cores)
      MEM:  95.2% (1800.5Gi / 1890.7Gi) waste: 90.2Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-22-225-h100�[0m: 8 pods
      CPU:  93.5% (178560m / 190930m) waste: 12370m (12.4 cores)
      MEM:  95.4% (1804.1Gi / 1890.7Gi) waste: 86.6Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-44-450-h100-2�[0m: 4 pods
      CPU:  92.9% (177280m / 190930m) waste: 13650m (13.7 cores)
      MEM:  95.3% (1802.0Gi / 1890.7Gi) waste: 88.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-88-900-h100-4�[0m: 2 pods
      CPU:  92.5% (176640m / 190930m) waste: 14290m (14.3 cores)
      MEM:  95.3% (1801.0Gi / 1890.7Gi) waste: 89.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [8xl-x86iamx-22-225-h100]
         CPU:  93.5%  MEM:  95.4%  GPU: 100.0%  waste: 12.4c + 86.6Gi
      �[0;32m#2�[0m [6xl-x86iamx-22-225-h100, 1xl-x86iamx-44-450-h100-2]
         CPU:  93.4%  MEM:  95.4%  GPU: 100.0%  waste: 12.7c + 87.1Gi
      �[0;32m#3�[0m [4xl-x86iamx-22-225-h100, 2xl-x86iamx-44-450-h100-2]
         CPU:  93.2%  MEM:  95.4%  GPU: 100.0%  waste: 13.0c + 87.6Gi
      �[0;32m#4�[0m [4xl-x86iamx-22-225-h100, 1xl-x86iamx-88-900-h100-4]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.3c + 88.2Gi
      �[0;32m#5�[0m [2xl-x86iamx-22-225-h100, 3xl-x86iamx-44-450-h100-2]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.3c + 88.2Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [2xl-x86iamx-22-225-h100, 1xl-x86iamx-44-450-h100-2, 1xl-x86iamx-88-900-h100-4]
         CPU:  92.9%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#2�[0m [4xl-x86iamx-44-450-h100-2]
         CPU:  92.9%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#3�[0m [2xl-x86iamx-44-450-h100-2, 1xl-x86iamx-88-900-h100-4]
         CPU:  92.7%  MEM:  95.3%  GPU: 100.0%  waste: 14.0c + 89.2Gi
      �[0;32m#4�[0m [2xl-x86iamx-88-900-h100-4]
         CPU:  92.5%  MEM:  95.3%  GPU: 100.0%  waste: 14.3c + 89.7Gi
      �[0;32m#5�[0m [1xl-bx86iamx-176-1800-h100-8]
         CPU:  92.3%  MEM:  95.2%  GPU: 100.0%  waste: 14.6c + 90.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: p6-b200.48xlarge�[0m
  Total: 192 vCPU, 2048Gi advertised (1894.4Gi actual), 8 GPU
  Kubelet reserved: 550m CPU, 2.5Gi RAM
  DaemonSet overhead: 520m CPU, 1.2Gi RAM
  �[0;32mAllocatable for runners: 190930m CPU (190.9 cores), 1890.7Gi RAM, 8 GPU�[0m

  �[1mRunners targeting this node:�[0m
    - l-bx86iamx-176-1800-b200-8: 176320m CPU, 1800.5Gi RAM, 8 GPU (job: 176c+1800.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-22-225-b200: 22320m CPU, 225.5Gi RAM, 1 GPU (job: 22c+225.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-44-450-b200-2: 44320m CPU, 450.5Gi RAM, 2 GPU (job: 44c+450.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-88-900-b200-4: 88320m CPU, 900.5Gi RAM, 4 GPU (job: 88c+900.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-bx86iamx-176-1800-b200-8�[0m: 1 pods
      CPU:  92.3% (176320m / 190930m) waste: 14610m (14.6 cores)
      MEM:  95.2% (1800.5Gi / 1890.7Gi) waste: 90.2Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-22-225-b200�[0m: 8 pods
      CPU:  93.5% (178560m / 190930m) waste: 12370m (12.4 cores)
      MEM:  95.4% (1804.1Gi / 1890.7Gi) waste: 86.6Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-44-450-b200-2�[0m: 4 pods
      CPU:  92.9% (177280m / 190930m) waste: 13650m (13.7 cores)
      MEM:  95.3% (1802.0Gi / 1890.7Gi) waste: 88.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU
    �[0;32ml-x86iamx-88-900-b200-4�[0m: 2 pods
      CPU:  92.5% (176640m / 190930m) waste: 14290m (14.3 cores)
      MEM:  95.3% (1801.0Gi / 1890.7Gi) waste: 89.7Gi
      GPU: 100.0% (8 / 8)
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [8xl-x86iamx-22-225-b200]
         CPU:  93.5%  MEM:  95.4%  GPU: 100.0%  waste: 12.4c + 86.6Gi
      �[0;32m#2�[0m [6xl-x86iamx-22-225-b200, 1xl-x86iamx-44-450-b200-2]
         CPU:  93.4%  MEM:  95.4%  GPU: 100.0%  waste: 12.7c + 87.1Gi
      �[0;32m#3�[0m [4xl-x86iamx-22-225-b200, 2xl-x86iamx-44-450-b200-2]
         CPU:  93.2%  MEM:  95.4%  GPU: 100.0%  waste: 13.0c + 87.6Gi
      �[0;32m#4�[0m [4xl-x86iamx-22-225-b200, 1xl-x86iamx-88-900-b200-4]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.3c + 88.2Gi
      �[0;32m#5�[0m [2xl-x86iamx-22-225-b200, 3xl-x86iamx-44-450-b200-2]
         CPU:  93.0%  MEM:  95.3%  GPU: 100.0%  waste: 13.3c + 88.2Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [2xl-x86iamx-22-225-b200, 1xl-x86iamx-44-450-b200-2, 1xl-x86iamx-88-900-b200-4]
         CPU:  92.9%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#2�[0m [4xl-x86iamx-44-450-b200-2]
         CPU:  92.9%  MEM:  95.3%  GPU: 100.0%  waste: 13.7c + 88.7Gi
      �[0;32m#3�[0m [2xl-x86iamx-44-450-b200-2, 1xl-x86iamx-88-900-b200-4]
         CPU:  92.7%  MEM:  95.3%  GPU: 100.0%  waste: 14.0c + 89.2Gi
      �[0;32m#4�[0m [2xl-x86iamx-88-900-b200-4]
         CPU:  92.5%  MEM:  95.3%  GPU: 100.0%  waste: 14.3c + 89.7Gi
      �[0;32m#5�[0m [1xl-bx86iamx-176-1800-b200-8]
         CPU:  92.3%  MEM:  95.2%  GPU: 100.0%  waste: 14.6c + 90.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: r7a.48xlarge�[0m
  Total: 192 vCPU, 1536Gi advertised (1420.8Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 1411.6Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iavx512-16-128: 16320m CPU, 128.5Gi RAM (job: 16c+128.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-32-256: 32320m CPU, 256.5Gi RAM (job: 32c+256.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-48-384: 48320m CPU, 384.5Gi RAM (job: 48c+384.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-8-64: 8320m CPU, 64.5Gi RAM (job: 8c+64.0Gi, hooks: 320m+522Mi)
    - l-x86iavx512-94-768: 94320m CPU, 740.5Gi RAM (job: 94c+740.0Gi, hooks: 320m+522Mi)
    - rel-l-x86iavx512-8-64: 8320m CPU, 64.5Gi RAM (job: 8c+64.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iavx512-16-128�[0m: 10 pods
      CPU:  85.4% (163200m / 191080m) waste: 27880m (27.9 cores)
      MEM:  91.0% (1285.1Gi / 1411.6Gi) waste: 126.5Gi
      Bottleneck: MEM
    �[1;33ml-x86iavx512-32-256�[0m: 5 pods
      CPU:  84.6% (161600m / 191080m) waste: 29480m (29.5 cores)
      MEM:  90.9% (1282.5Gi / 1411.6Gi) waste: 129.1Gi
      Bottleneck: CPU
    �[0;31ml-x86iavx512-48-384�[0m: 3 pods
      CPU:  75.9% (144960m / 191080m) waste: 46120m (46.1 cores)
      MEM:  81.7% (1153.5Gi / 1411.6Gi) waste: 258.1Gi
      Bottleneck: CPU
    �[0;32ml-x86iavx512-8-64�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  96.0% (1354.7Gi / 1411.6Gi) waste: 56.9Gi
      Bottleneck: MEM
    �[0;31ml-x86iavx512-94-768�[0m: 1 pods
      CPU:  49.4% (94320m / 191080m) waste: 96760m (96.8 cores)
      MEM:  52.5% (740.5Gi / 1411.6Gi) waste: 671.1Gi
      Bottleneck: MEM
    �[0;32mrel-l-x86iavx512-8-64�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  96.0% (1354.7Gi / 1411.6Gi) waste: 56.9Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 572

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [5xl-x86iavx512-16-128, 2xl-x86iavx512-48-384]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#2�[0m [4xl-x86iavx512-16-128, 2xl-x86iavx512-32-256, 1xl-x86iavx512-48-384]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#3�[0m [3xl-x86iavx512-16-128, 4xl-x86iavx512-32-256]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#4�[0m [2xl-x86iavx512-16-128, 1xl-x86iavx512-32-256, 2xl-x86iavx512-48-384, 2xl-x86iavx512-8-64]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi
      �[0;32m#5�[0m [2xl-x86iavx512-16-128, 1xl-x86iavx512-32-256, 2xl-x86iavx512-48-384, 1xl-x86iavx512-8-64, 1xrel-l-x86iavx512-8-64]
         CPU:  93.3%  MEM: 100.0%  waste: 12.8c + 57Mi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[1;33m#1�[0m [1xl-x86iavx512-16-128, 1xl-x86iavx512-32-256, 2xl-x86iavx512-48-384, 3xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#2�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 3xl-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#3�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 2xl-x86iavx512-8-64, 1xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#4�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 1xl-x86iavx512-8-64, 2xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi
      �[1;33m#5�[0m [3xl-x86iavx512-32-256, 1xl-x86iavx512-48-384, 3xrel-l-x86iavx512-8-64]
         CPU:  89.1%  MEM:  95.5%  waste: 20.8c + 64.1Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: r7g.16xlarge�[0m
  Total: 64 vCPU, 512Gi advertised (473.7Gi actual)
  Kubelet reserved: 230m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 63400m CPU (63.4 cores), 464.5Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g3-61-463: 61320m CPU, 463.5Gi RAM (job: 61c+463.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[0;32ml-arm64g3-61-463�[0m: 1 pods
      CPU:  96.7% (61320m / 63400m) waste: 2080m (2.1 cores)
      MEM:  99.8% (463.5Gi / 464.5Gi) waste: 1.0Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[0;32m#1�[0m [1xl-arm64g3-61-463]
         CPU:  96.7%  MEM:  99.8%  waste: 2.1c + 1.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: r7i.48xlarge�[0m
  Total: 192 vCPU, 1536Gi advertised (1420.8Gi actual)
  Kubelet reserved: 550m CPU, 8.3Gi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 191080m CPU (191.1 cores), 1411.6Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-x86iamx-16-128: 16320m CPU, 128.5Gi RAM (job: 16c+128.0Gi, hooks: 320m+522Mi)
    - l-x86iamx-8-64: 8320m CPU, 64.5Gi RAM (job: 8c+64.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-x86iamx-16-128�[0m: 10 pods
      CPU:  85.4% (163200m / 191080m) waste: 27880m (27.9 cores)
      MEM:  91.0% (1285.1Gi / 1411.6Gi) waste: 126.5Gi
      Bottleneck: MEM
    �[0;32ml-x86iamx-8-64�[0m: 21 pods
      CPU:  91.4% (174720m / 191080m) waste: 16360m (16.4 cores)
      MEM:  96.0% (1354.7Gi / 1411.6Gi) waste: 56.9Gi
      Bottleneck: MEM

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 10

    �[0;32mTop 5 most efficient:�[0m
      �[0;32m#1�[0m [1xl-x86iamx-16-128, 19xl-x86iamx-8-64]
         CPU:  91.3%  MEM:  95.9%  waste: 16.7c + 57.4Gi
      �[0;32m#2�[0m [2xl-x86iamx-16-128, 17xl-x86iamx-8-64]
         CPU:  91.1%  MEM:  95.9%  waste: 17.0c + 57.9Gi
      �[0;32m#3�[0m [3xl-x86iamx-16-128, 15xl-x86iamx-8-64]
         CPU:  90.9%  MEM:  95.9%  waste: 17.3c + 58.4Gi
      �[0;32m#4�[0m [4xl-x86iamx-16-128, 13xl-x86iamx-8-64]
         CPU:  90.8%  MEM:  95.8%  waste: 17.6c + 59.0Gi
      �[0;32m#5�[0m [5xl-x86iamx-16-128, 11xl-x86iamx-8-64]
         CPU:  90.6%  MEM:  95.8%  waste: 18.0c + 59.5Gi

    �[0;31mBottom 5 least efficient (money on the table):�[0m
      �[0;32m#1�[0m [6xl-x86iamx-16-128, 9xl-x86iamx-8-64]
         CPU:  90.4%  MEM:  95.8%  waste: 18.3c + 60.0Gi
      �[0;32m#2�[0m [7xl-x86iamx-16-128, 7xl-x86iamx-8-64]
         CPU:  90.3%  MEM:  95.7%  waste: 18.6c + 60.5Gi
      �[0;32m#3�[0m [8xl-x86iamx-16-128, 5xl-x86iamx-8-64]
         CPU:  90.1%  MEM:  95.7%  waste: 18.9c + 61.0Gi
      �[1;33m#4�[0m [9xl-x86iamx-16-128, 3xl-x86iamx-8-64]
         CPU:  89.9%  MEM:  95.6%  waste: 19.2c + 61.5Gi
      �[1;33m#5�[0m [10xl-x86iamx-16-128, 1xl-x86iamx-8-64]
         CPU:  89.8%  MEM:  95.6%  waste: 19.6c + 62.0Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1m�[0;36mNode Type: t4g.2xlarge�[0m
  Total: 8 vCPU, 32Gi advertised (29.6Gi actual)
  Kubelet reserved: 90m CPU, 993Mi RAM
  DaemonSet overhead: 370m CPU, 934Mi RAM
  �[0;32mAllocatable for runners: 7540m CPU (7.5 cores), 27.7Gi RAM�[0m

  �[1mRunners targeting this node:�[0m
    - l-arm64g2-6-25: 6320m CPU, 25.5Gi RAM (job: 6c+25.0Gi, hooks: 320m+522Mi)

  �[1mHomogeneous packing (single runner type fills the node):�[0m
    �[1;33ml-arm64g2-6-25�[0m: 1 pods
      CPU:  83.8% (6320m / 7540m) waste: 1220m (1.2 cores)
      MEM:  92.0% (25.5Gi / 27.7Gi) waste: 2.2Gi
      Bottleneck: CPU

  �[1mMaximal mixed combos (node fully packed, no room for another pod):�[0m
    Total maximal combos: 1

    �[0;32mTop 1 most efficient:�[0m
      �[1;33m#1�[0m [1xl-arm64g2-6-25]
         CPU:  83.8%  MEM:  92.0%  waste: 1.2c + 2.2Gi

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[0;31m�[1mFound 14 runner type(s) with homogeneous utilization below 90.0%�[0m

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
�[1mUnused resource headroom per node (homogeneous packing only):�[0m

  Node Type                 Min CPU    Max CPU    Min MEM    Max MEM
  ────────────────────────────────────────────────────────────────
  c7a.48xlarge              4480m     96760m      3.3Gi    156.5Gi
  c7i.12xlarge              1120m      5840m      530Mi      2.5Gi
  c7i.metal-24xl            3000m      3000m      936Mi      936Mi
  g4dn.12xlarge             1970m      1970m      1.0Gi      1.0Gi
  g4dn.8xlarge              2010m      2010m      712Mi      712Mi
  g4dn.metal                 530m     34850m      708Mi    302.2Gi
  g5.12xlarge               1970m      1970m      616Mi      616Mi
  g5.48xlarge               1610m      1610m      627Mi      627Mi
  g5.8xlarge                2010m      2010m      824Mi      824Mi
  g6.12xlarge               1970m      1970m      1.0Gi      1.0Gi
  g6.8xlarge                2010m      2010m      824Mi      824Mi
  m6i.32xlarge              1920m      1920m      1.0Gi      1.0Gi
  m7g.8xlarge              15160m     15160m     52.1Gi     52.1Gi
  m7g.metal                 1080m      1080m      1.2Gi      1.2Gi
  m7i.48xlarge             16360m     29800m     18.5Gi     59.2Gi
  m8g.16xlarge              1080m      1080m      1.2Gi      1.2Gi
  m8g.48xlarge             11560m     11560m     13.6Gi     13.6Gi
  p4d.24xlarge              4610m      6850m     56.9Gi     60.4Gi
  p5.48xlarge              12370m     14610m     86.6Gi     90.2Gi
  p6-b200.48xlarge         12370m     14610m     86.6Gi     90.2Gi
  r7a.48xlarge             16360m     96760m     56.9Gi    671.1Gi
  r7g.16xlarge              2080m      2080m      1.0Gi      1.0Gi
  r7i.48xlarge             16360m     27880m     56.9Gi    126.5Gi
  t4g.2xlarge               1220m      1220m      2.2Gi      2.2Gi
  ────────────────────────────────────────────────────────────────
  �[1mWORST CASE            �[0m     530m     96760m      530Mi    671.1Gi

  The tightest node has only �[1m530m CPU�[0m and �[1m530Mi RAM�[0m free.
  Any new DaemonSet must fit within these limits or runners will fail to schedule.

github-actions · 2026-06-12T21:10:25Z

tofu plan — arc-cbr-production

✅ Plan succeeded · commit 808e0dbd · run log

Plan output

Installed 1 package in 1ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


Acquiring state lock. This may take a few moments...
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=527854a4-e335-4f95-bc89-1321cff7a478]
data.aws_availability_zones.available: Reading...
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3]
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role]
module.eks.data.aws_caller_identity.current: Reading...
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0e712dc7e743bbcf7]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-node-role]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-2]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNOLQFN6MU]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-eks-secrets]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-node-role:pytorch-arc-cbr-production-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-harbor-s3/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-032d4401e63f0c9b9]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-05e96ee7cb818e5c0]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-0d34063a19f4b07b4]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ab11fcdb8d4ea113]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0d26e280575e8aaf4]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-0a583bbbcac436ebd]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0709abbcafa23aec0]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-01187bfaa68514400]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-01e479dcb5aedf696]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-0577a02acde719bff]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0992f582e9bf2836e]
module.vpc.aws_eip.nat_secondary["us-east-2c-2"]: Refreshing state... [id=eipalloc-07cfdb2fd5dc07459]
module.vpc.aws_eip.nat_secondary["us-east-2c-6"]: Refreshing state... [id=eipalloc-0aede78edc69cf695]
module.vpc.aws_eip.nat_secondary["us-east-2a-1"]: Refreshing state... [id=eipalloc-0f2b00a9ac31df215]
module.vpc.aws_eip.nat_secondary["us-east-2a-3"]: Refreshing state... [id=eipalloc-034d5e1f5a2fcb795]
module.vpc.aws_eip.nat_secondary["us-east-2b-4"]: Refreshing state... [id=eipalloc-0de33181548ac2e5a]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-009f1fe7d56695348]
module.vpc.aws_eip.nat_secondary["us-east-2b-3"]: Refreshing state... [id=eipalloc-021ee6c9f1d20b71a]
module.vpc.aws_eip.nat_secondary["us-east-2c-1"]: Refreshing state... [id=eipalloc-06a980076e99cda81]
module.vpc.aws_eip.nat_secondary["us-east-2c-0"]: Refreshing state... [id=eipalloc-03542e74755fc105b]
module.vpc.aws_eip.nat_secondary["us-east-2c-4"]: Refreshing state... [id=eipalloc-0cc3dadec18bbb3f3]
module.vpc.aws_eip.nat_secondary["us-east-2b-0"]: Refreshing state... [id=eipalloc-0cead990d60ce181e]
module.vpc.aws_eip.nat_secondary["us-east-2c-5"]: Refreshing state... [id=eipalloc-02825435a2786b3d8]
module.vpc.aws_eip.nat_secondary["us-east-2b-5"]: Refreshing state... [id=eipalloc-0cde9a6463901f1e1]
module.vpc.aws_eip.nat_secondary["us-east-2b-6"]: Refreshing state... [id=eipalloc-06b7b88826199a232]
module.vpc.aws_eip.nat_secondary["us-east-2a-6"]: Refreshing state... [id=eipalloc-0113c95dbdec2f879]
module.vpc.aws_eip.nat_secondary["us-east-2a-2"]: Refreshing state... [id=eipalloc-09b15a770e0c6d552]
module.vpc.aws_eip.nat_secondary["us-east-2a-5"]: Refreshing state... [id=eipalloc-0bd9bf54bd6010323]
module.vpc.aws_eip.nat_secondary["us-east-2b-1"]: Refreshing state... [id=eipalloc-0e67c0a8cd8c990da]
module.vpc.aws_eip.nat_secondary["us-east-2b-2"]: Refreshing state... [id=eipalloc-063bee447616351f9]
module.vpc.aws_eip.nat_secondary["us-east-2a-0"]: Refreshing state... [id=eipalloc-086a011b3c26c0dd7]
module.vpc.aws_eip.nat_secondary["us-east-2a-4"]: Refreshing state... [id=eipalloc-067d535102a61d1a8]
module.vpc.aws_eip.nat_secondary["us-east-2c-3"]: Refreshing state... [id=eipalloc-0d3a71569b2f687be]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0fddf2f74e7e978c7]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-084975a7f7af2696e]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-0ce4fba002d90e7d5]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-07d5cd4c479c827ab]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-08e264cbbd47be1ee]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-0f7b8f4473e5790df]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0ad75b2f5282877db]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production:kube-proxy]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production:vpc-cni]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-0c7ecd4166a01e5f0]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-0cb3785c433ed7718]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01d38d41a7ca82a08]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-0b820cd15307b6d57]
module.eks.data.tls_certificate.cluster[0]: Read complete after 0s [id=033a163afb2babc26f7883e642621ac361c93d61]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-2.amazonaws.com/id/0A621339248958D6D5F2FF084BD185B5]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0beb143017359bda1]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-0b6e08b4b0dc968c0]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-097abe4676c74f71b]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production:pytorch-arc-cbr-production-base-nodes]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=2879363015]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-harbor-registry]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance]
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-2.amazonaws.com/308535385114/pytorch-arc-cbr-production-karpenter]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-spot-interruption-KarpenterSpotInterruption]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-instance-state-change-KarpenterInstanceStateChange]
data.terraform_remote_state.base: Read complete after 2s
aws_ec2_tag.subnet_karpenter_discovery["subnet-0992f582e9bf2836e"]: Refreshing state... [id=subnet-0992f582e9bf2836e,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0709abbcafa23aec0"]: Refreshing state... [id=subnet-0709abbcafa23aec0,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0577a02acde719bff"]: Refreshing state... [id=subnet-0577a02acde719bff,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-karpenter-controller]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller]
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-01ec5f742ae028981,karpenter.sh/discovery]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-karpenter-controller-20260518021844404100000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (arc-cbr-production) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-pypi-wants-collector-s3]
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-pypi-wheel-syncer-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-0deb818bbf18764de]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wheel-syncer-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-efs-csi-driver-role]
aws_iam_role.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wants-collector-role]
aws_security_group.efs: Refreshing state... [id=sg-0979eb5e3d9d3db9f]
aws_efs_mount_target.pypi_cache["subnet-0709abbcafa23aec0"]: Refreshing state... [id=fsmt-08cd5108febbacef9]
aws_efs_mount_target.pypi_cache["subnet-0992f582e9bf2836e"]: Refreshing state... [id=fsmt-03523586bb4ff0c46]
aws_efs_mount_target.pypi_cache["subnet-0577a02acde719bff"]: Refreshing state... [id=fsmt-07d7b111b9cd6684e]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wheel-syncer-role-20260518023249929400000004]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-efs-csi-driver-role-20260518023249955700000005]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-pypi-wants-collector-role-20260518023249903900000003]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production:aws-efs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

github-actions · 2026-06-12T21:11:41Z

tofu plan — arc-cbr-production-uw1

✅ Plan succeeded · commit 808e0dbd · run log

Plan output

Installed 1 package in 2ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod-uw1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (arc-cbr-production-uw1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.data.aws_caller_identity.current: Reading...
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-0121d1038d393182a]
data.aws_availability_zones.available: Reading...
module.eks.aws_iam_role.cluster: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role]
module.eks.aws_iam_role.node: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role]
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3]
module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=1fb5d763-c5cd-4de5-bf40-712df992288c]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNFWBLKNFS]
data.aws_availability_zones.available: Read complete after 1s [id=us-west-1]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/pytorch-arc-cbr-production-uw1-eks-secrets]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=pytorch-arc-cbr-production-uw1-node-role:pytorch-arc-cbr-production-uw1-node-cni-ipv6]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-07fd8394a1d58b614]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-0b3b22b995e71d8d9]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-07b06397ce403fa53]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0a13e7b49c841e497]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0a8410ffa0f0014a7]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0bd275a35f8e7ef65]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-08861bee27120b994]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-05f5edbf2c6678c03]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-0ce35bb011df0cfdb]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-06d137da3460167c4]
module.vpc.aws_eip.nat_secondary["us-west-1c-5"]: Refreshing state... [id=eipalloc-0635efedc10ee5f66]
module.vpc.aws_eip.nat_secondary["us-west-1a-0"]: Refreshing state... [id=eipalloc-0e3ca79e34012a238]
module.vpc.aws_eip.nat_secondary["us-west-1a-6"]: Refreshing state... [id=eipalloc-08763a35db0a26caa]
module.vpc.aws_eip.nat_secondary["us-west-1a-4"]: Refreshing state... [id=eipalloc-0dfae88698dce850e]
module.vpc.aws_eip.nat_secondary["us-west-1c-3"]: Refreshing state... [id=eipalloc-09f89978685e7f3c7]
module.vpc.aws_eip.nat_secondary["us-west-1a-5"]: Refreshing state... [id=eipalloc-059986f686b188dc2]
module.vpc.aws_eip.nat_secondary["us-west-1c-2"]: Refreshing state... [id=eipalloc-0f2e15b6a36b52fac]
module.vpc.aws_eip.nat_secondary["us-west-1c-6"]: Refreshing state... [id=eipalloc-0cf91a032d10f4ec5]
module.vpc.aws_eip.nat_secondary["us-west-1c-4"]: Refreshing state... [id=eipalloc-0dfaa16c61333ceb3]
module.vpc.aws_eip.nat_secondary["us-west-1a-3"]: Refreshing state... [id=eipalloc-05a2bad636af56f4d]
module.vpc.aws_eip.nat_secondary["us-west-1a-2"]: Refreshing state... [id=eipalloc-0647e169131be5893]
module.vpc.aws_eip.nat_secondary["us-west-1c-0"]: Refreshing state... [id=eipalloc-0d565f5bf077b05cf]
module.vpc.aws_eip.nat_secondary["us-west-1c-1"]: Refreshing state... [id=eipalloc-0bd09c7f2dcaa0a46]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-s3-20260519191031756900000001]
module.vpc.aws_eip.nat_secondary["us-west-1a-1"]: Refreshing state... [id=eipalloc-012ac413772344fea]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-00184fa8d73e575c9]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0f79a2ac72857a304]
module.eks.aws_eks_cluster.this: Refreshing state... [id=pytorch-arc-cbr-production-uw1]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0c336634317cc9f35]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-01ec520e3931f5f6a]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-01165f36472c0a780]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-06e17b37b87d890f2]
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=pytorch-arc-cbr-production-uw1:kube-proxy]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=pytorch-arc-cbr-production-uw1:vpc-cni]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-066ae5f473a2b07c0]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-0cc835aef3e3bcc21]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-02e4c54e5fa3b4f8a]
module.eks.aws_eks_node_group.base: Refreshing state... [id=pytorch-arc-cbr-production-uw1:pytorch-arc-cbr-production-uw1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 1s [id=ab5db6c82031e2d229412c67921160a3b3af073b]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-west-1.amazonaws.com/id/ED52EC64FF5CFAB4151C6E4B5DE279BD]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=3969145930]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=pytorch-arc-cbr-production-uw1#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=pytorch-arc-cbr-production-uw1:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=pytorch-arc-cbr-production-uw1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=pytorch-arc-cbr-production-uw1-harbor-registry/arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-harbor-registry]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-ebs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-west-1.amazonaws.com/308535385114/pytorch-arc-cbr-production-uw1-karpenter]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-spot-interruption-KarpenterSpotInterruption]
data.terraform_remote_state.base: Read complete after 1s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-058909cc1cdc63fad,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0a13e7b49c841e497"]: Refreshing state... [id=subnet-0a13e7b49c841e497,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-08861bee27120b994"]: Refreshing state... [id=subnet-08861bee27120b994,karpenter.sh/discovery]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-karpenter-controller]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=pytorch-arc-cbr-production-uw1-karpenter-controller-20260519195229107000000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (arc-cbr-production-uw1) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-s3]
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/pytorch-arc-cbr-production-uw1-pypi-wants-collector-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-0da5eaf2022d80aa0]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role]
aws_iam_role.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role]
aws_iam_role.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role]
aws_security_group.efs: Refreshing state... [id=sg-01c1f3fa51705db76]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1-efs-csi-driver-role-20260519200350826400000005]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wheel-syncer-role-20260519200350777100000003]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=pytorch-arc-cbr-production-uw1-pypi-wants-collector-role-20260519200350781900000004]
aws_efs_mount_target.pypi_cache["subnet-08861bee27120b994"]: Refreshing state... [id=fsmt-00708cc923d4d2055]
aws_efs_mount_target.pypi_cache["subnet-0a13e7b49c841e497"]: Refreshing state... [id=fsmt-089fd42858a5a85ab]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=pytorch-arc-cbr-production-uw1:aws-efs-csi-driver]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

github-actions · 2026-06-12T21:12:55Z

tofu plan — meta-prod-aws-ue1

✅ Plan succeeded · commit 808e0dbd · run log

Plan output

Installed 1 package in 1ms
{
    "BucketArn": "arn:aws:s3:::ciforge-tfstate-arc-cbr-prod-ue1",
    "BucketRegion": "us-west-2",
    "AccessPointAlias": false
}
━━━ PLAN: Base (meta-prod-aws-ue1) ━━━
There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache cannot be opened: stat /home/runner/work/ci-infra/ci-infra/osdc/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, OpenTofu may not behave as intended.


module.eks.data.aws_ami.eks_optimized_al2023: Reading...
module.harbor.aws_iam_user.harbor_s3: Refreshing state... [id=meta-prod-aws-ue1-harbor-s3]
module.eks.aws_iam_role.cluster: Refreshing state... [id=meta-prod-aws-ue1-cluster-role]
module.eks.aws_iam_role.node: Refreshing state... [id=meta-prod-aws-ue1-node-role]
module.eks.data.aws_caller_identity.current: Reading...
data.aws_availability_zones.available: Reading...
module.eks.aws_kms_key.eks_secrets[0]: Refreshing state... [id=9274017b-776a-41bd-9f11-d118a1174159]
module.vpc.aws_vpc.this: Refreshing state... [id=vpc-046818728dce02486]
module.harbor.aws_s3_bucket.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_caller_identity.current: Read complete after 0s [id=308535385114]
module.eks.aws_kms_alias.eks_secrets[0]: Refreshing state... [id=alias/meta-prod-aws-ue1-eks-secrets]
module.harbor.aws_iam_access_key.harbor_s3: Refreshing state... [id=AKIAUPVRELQNGRUDTXPT]
data.aws_availability_zones.available: Read complete after 0s [id=us-east-1]
module.eks.aws_iam_role_policy_attachment.cluster_policy: Refreshing state... [id=meta-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSClusterPolicy]
module.eks.aws_iam_role_policy_attachment.vpc_resource_controller: Refreshing state... [id=meta-prod-aws-ue1-cluster-role/arn:aws:iam::aws:policy/AmazonEKSVPCResourceController]
module.eks.aws_iam_role_policy_attachment.cni_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy]
module.eks.aws_iam_role_policy.node_cni_ipv6: Refreshing state... [id=meta-prod-aws-ue1-node-role:meta-prod-aws-ue1-node-cni-ipv6]
module.eks.aws_iam_role_policy_attachment.ssm_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore]
module.eks.aws_iam_role_policy_attachment.ecr_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly]
module.eks.aws_iam_role_policy_attachment.node_policy: Refreshing state... [id=meta-prod-aws-ue1-node-role/arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy]
module.harbor.aws_s3_bucket_public_access_block.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_iam_policy.harbor_registry: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_s3_bucket_server_side_encryption_configuration.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.harbor.aws_iam_user_policy_attachment.harbor_s3: Refreshing state... [id=meta-prod-aws-ue1-harbor-s3/arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_ami.eks_optimized_al2023: Read complete after 1s [id=ami-0dafeb02304897431]
module.vpc.aws_internet_gateway.this: Refreshing state... [id=igw-0cf3d9cf37ee998b6]
module.vpc.aws_egress_only_internet_gateway.this: Refreshing state... [id=eigw-0ce44cb6446f3c1b6]
module.vpc.aws_subnet.private[2]: Refreshing state... [id=subnet-02ce11d6646870431]
module.vpc.aws_route_table.public: Refreshing state... [id=rtb-0beb5fc44f0ee165f]
module.vpc.aws_subnet.private[1]: Refreshing state... [id=subnet-0348c5058db524cd2]
module.vpc.aws_subnet.private[0]: Refreshing state... [id=subnet-0d65ec2dd49f0d87c]
module.vpc.aws_eip.nat_secondary["us-east-1a-0"]: Refreshing state... [id=eipalloc-0c8a6faed0a97479d]
module.vpc.aws_eip.nat_secondary["us-east-1c-0"]: Refreshing state... [id=eipalloc-05844040c7248f44f]
module.vpc.aws_eip.nat_secondary["us-east-1c-2"]: Refreshing state... [id=eipalloc-025ef0e1813277c67]
module.vpc.aws_eip.nat_secondary["us-east-1b-4"]: Refreshing state... [id=eipalloc-0aba12aa23c11d20c]
module.vpc.aws_eip.nat_secondary["us-east-1a-5"]: Refreshing state... [id=eipalloc-01f89a7c130d2a810]
module.vpc.aws_eip.nat_secondary["us-east-1c-4"]: Refreshing state... [id=eipalloc-00c5df9f3b60f353d]
module.vpc.aws_eip.nat_secondary["us-east-1c-1"]: Refreshing state... [id=eipalloc-0cb5208c5f775baf6]
module.vpc.aws_eip.nat_secondary["us-east-1a-6"]: Refreshing state... [id=eipalloc-02e84a51a14c9cbda]
module.vpc.aws_eip.nat_secondary["us-east-1a-3"]: Refreshing state... [id=eipalloc-0bda13d7b70c00c00]
module.vpc.aws_eip.nat_secondary["us-east-1b-0"]: Refreshing state... [id=eipalloc-0bcfe1f98793e1b12]
module.vpc.aws_eip.nat_secondary["us-east-1a-1"]: Refreshing state... [id=eipalloc-08c7bd3306cf687ca]
module.vpc.aws_eip.nat_secondary["us-east-1b-6"]: Refreshing state... [id=eipalloc-0f922f499d32f1368]
module.vpc.aws_eip.nat_secondary["us-east-1b-1"]: Refreshing state... [id=eipalloc-0d095305019486ae6]
module.vpc.aws_eip.nat_secondary["us-east-1b-2"]: Refreshing state... [id=eipalloc-0f0b720f4cca62ec7]
module.vpc.aws_eip.nat_secondary["us-east-1a-4"]: Refreshing state... [id=eipalloc-09fa171393c3a7cfb]
module.vpc.aws_eip.nat_secondary["us-east-1b-3"]: Refreshing state... [id=eipalloc-0c8291ee817240e1f]
module.vpc.aws_eip.nat_secondary["us-east-1c-3"]: Refreshing state... [id=eipalloc-0af54aa2e5f40dfa4]
module.vpc.aws_eip.nat_secondary["us-east-1a-2"]: Refreshing state... [id=eipalloc-080ec4e265ebdc5ad]
module.vpc.aws_eip.nat_secondary["us-east-1b-5"]: Refreshing state... [id=eipalloc-0d078dc6f07628714]
module.vpc.aws_eip.nat_secondary["us-east-1c-5"]: Refreshing state... [id=eipalloc-04fe645562f597aaa]
module.vpc.aws_eip.nat_secondary["us-east-1c-6"]: Refreshing state... [id=eipalloc-0d22d3aa0667a1070]
module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-0eafd792589fbb363]
module.vpc.aws_eip.nat[1]: Refreshing state... [id=eipalloc-00c2e2605c4dea199]
module.vpc.aws_eip.nat[2]: Refreshing state... [id=eipalloc-033772b4490df1b41]
module.vpc.aws_subnet.public[1]: Refreshing state... [id=subnet-078f44b58c8b48ade]
module.vpc.aws_subnet.public[2]: Refreshing state... [id=subnet-07bfd0f170c3b3406]
module.vpc.aws_subnet.public[0]: Refreshing state... [id=subnet-0f922406e02ecba1d]
module.eks.aws_eks_cluster.this: Refreshing state... [id=meta-prod-aws-ue1]
module.vpc.aws_route_table_association.public[2]: Refreshing state... [id=rtbassoc-05e7e66e960593972]
module.vpc.aws_route_table_association.public[0]: Refreshing state... [id=rtbassoc-05da47c4ed26ae390]
module.vpc.aws_route_table_association.public[1]: Refreshing state... [id=rtbassoc-0616491b7baeab47f]
module.eks.aws_eks_addon.kube_proxy: Refreshing state... [id=meta-prod-aws-ue1:kube-proxy]
module.eks.data.tls_certificate.cluster[0]: Reading...
module.eks.aws_eks_access_entry.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=meta-prod-aws-ue1:arn:aws:iam::308535385114:role/osdc_gha_prod]
module.eks.aws_eks_addon.vpc_cni: Refreshing state... [id=meta-prod-aws-ue1:vpc-cni]
module.eks.aws_launch_template.base: Refreshing state... [id=lt-043779597e3b5a7fd]
module.vpc.aws_nat_gateway.this[2]: Refreshing state... [id=nat-09414719983019b49]
module.vpc.aws_nat_gateway.this[0]: Refreshing state... [id=nat-025de56c0aac8d3f0]
module.vpc.aws_nat_gateway.this[1]: Refreshing state... [id=nat-0cff785d8001fc914]
module.eks.aws_eks_node_group.base: Refreshing state... [id=meta-prod-aws-ue1:meta-prod-aws-ue1-base-nodes]
module.eks.data.tls_certificate.cluster[0]: Read complete after 1s [id=b1b539daa206035ae3c3e28288b0681fa1b462f3]
module.eks.aws_iam_openid_connect_provider.cluster[0]: Refreshing state... [id=arn:aws:iam::308535385114:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/6C84A48E1BF23A027C1E78912A368743]
module.vpc.aws_route_table.private[2]: Refreshing state... [id=rtb-05d5b7a41aa6323ed]
module.vpc.aws_route_table.private[1]: Refreshing state... [id=rtb-0c665948be8d0282e]
module.vpc.aws_route_table.private[0]: Refreshing state... [id=rtb-09287d705ce4a88bc]
module.harbor.aws_iam_role.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry]
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Reading...
module.eks.data.aws_iam_policy_document.ebs_csi_assume_role[0]: Read complete after 0s [id=3022997555]
module.eks.aws_iam_role.ebs_csi_driver[0]: Refreshing state... [id=meta-prod-aws-ue1-ebs-csi-driver-role]
module.vpc.aws_route_table_association.private[0]: Refreshing state... [id=rtbassoc-02a8683fa7258f295]
module.vpc.aws_route_table_association.private[1]: Refreshing state... [id=rtbassoc-09dca398d838d4247]
module.vpc.aws_route_table_association.private[2]: Refreshing state... [id=rtbassoc-0306281246323bd27]
module.eks.aws_eks_addon.coredns: Refreshing state... [id=meta-prod-aws-ue1:coredns]
module.eks.aws_iam_role_policy_attachment.ebs_csi_driver[0]: Refreshing state... [id=meta-prod-aws-ue1-ebs-csi-driver-role/arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy]
module.eks.aws_eks_addon.ebs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1:aws-ebs-csi-driver]
module.harbor.aws_iam_role_policy_attachment.harbor_registry: Refreshing state... [id=meta-prod-aws-ue1-harbor-registry/arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-harbor-registry]
module.eks.aws_eks_access_policy_association.cluster_admin["osdc_gha_prod"]: Refreshing state... [id=meta-prod-aws-ue1#arn:aws:iam::308535385114:role/osdc_gha_prod#arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module karpenter (meta-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
aws_cloudwatch_event_rule.instance_state_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-instance-state-change]
aws_cloudwatch_event_rule.rebalance: Refreshing state... [id=meta-prod-aws-ue1-karpenter-rebalance]
aws_cloudwatch_event_rule.spot_interruption: Refreshing state... [id=meta-prod-aws-ue1-karpenter-spot-interruption]
aws_sqs_queue.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/308535385114/meta-prod-aws-ue1-karpenter]
aws_cloudwatch_event_rule.scheduled_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-scheduled-change]
aws_sqs_queue_policy.karpenter: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/308535385114/meta-prod-aws-ue1-karpenter]
aws_cloudwatch_event_target.instance_state_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-instance-state-change-KarpenterInstanceStateChange]
aws_cloudwatch_event_target.scheduled_change: Refreshing state... [id=meta-prod-aws-ue1-karpenter-scheduled-change-KarpenterScheduledChange]
aws_cloudwatch_event_target.rebalance: Refreshing state... [id=meta-prod-aws-ue1-karpenter-rebalance-KarpenterRebalance]
aws_cloudwatch_event_target.spot_interruption: Refreshing state... [id=meta-prod-aws-ue1-karpenter-spot-interruption-KarpenterSpotInterruption]
data.terraform_remote_state.base: Read complete after 2s
aws_ec2_tag.cluster_sg_karpenter: Refreshing state... [id=sg-016f4a0d209f3e4a9,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0348c5058db524cd2"]: Refreshing state... [id=subnet-0348c5058db524cd2,karpenter.sh/discovery]
aws_ec2_tag.subnet_karpenter_discovery["subnet-0d65ec2dd49f0d87c"]: Refreshing state... [id=subnet-0d65ec2dd49f0d87c,karpenter.sh/discovery]
aws_iam_role.karpenter_controller: Refreshing state... [id=meta-prod-aws-ue1-karpenter-controller]
aws_iam_policy.karpenter_controller: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-karpenter-controller]
aws_ec2_tag.subnet_karpenter_discovery["subnet-02ce11d6646870431"]: Refreshing state... [id=subnet-02ce11d6646870431,karpenter.sh/discovery]
aws_iam_role_policy_attachment.karpenter_controller: Refreshing state... [id=meta-prod-aws-ue1-karpenter-controller-20260528200455768400000001]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

━━━ PLAN: Module pypi-cache (meta-prod-aws-ue1) ━━━
data.terraform_remote_state.base: Reading...
aws_iam_policy.wants_collector: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-pypi-wants-collector-s3]
aws_iam_policy.wheel_syncer: Refreshing state... [id=arn:aws:iam::308535385114:policy/meta-prod-aws-ue1-pypi-wheel-syncer-s3]
aws_efs_file_system.pypi_cache: Refreshing state... [id=fs-023e57b36ec1cd426]
data.terraform_remote_state.base: Read complete after 1s
aws_iam_role.wants_collector: Refreshing state... [id=meta-prod-aws-ue1-pypi-wants-collector-role]
aws_iam_role.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1-efs-csi-driver-role]
aws_security_group.efs: Refreshing state... [id=sg-0bc06caa62214c9b7]
aws_iam_role.wheel_syncer: Refreshing state... [id=meta-prod-aws-ue1-pypi-wheel-syncer-role]
aws_iam_role_policy_attachment.wants_collector: Refreshing state... [id=meta-prod-aws-ue1-pypi-wants-collector-role-20260528201106192600000004]
aws_iam_role_policy_attachment.wheel_syncer: Refreshing state... [id=meta-prod-aws-ue1-pypi-wheel-syncer-role-20260528201106257700000005]
aws_iam_role_policy_attachment.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1-efs-csi-driver-role-20260528201106116400000003]
aws_eks_addon.efs_csi_driver: Refreshing state... [id=meta-prod-aws-ue1:aws-efs-csi-driver]
aws_efs_mount_target.pypi_cache["subnet-02ce11d6646870431"]: Refreshing state... [id=fsmt-06a05c001541338d2]
aws_efs_mount_target.pypi_cache["subnet-0348c5058db524cd2"]: Refreshing state... [id=fsmt-0500c573cafe66133]
aws_efs_mount_target.pypi_cache["subnet-0d65ec2dd49f0d87c"]: Refreshing state... [id=fsmt-0ffaedc58eceb7749]

No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

[ghstack-poisoned]

Temporary test configuration — DO NOT MERGE. Parallel to the A100 [TEST-ONLY] commit: the same NUMA pipeline, validated on g4dn.metal (T4) because AWS does not offer A100/p4d in us-west-1 (arc-staging's region), whereas g4dn.metal is available on-demand there. - Add nfd + numa-scheduler to arc-staging modules; remove from prod clusters - Point NFD topology-updater + taint-remover nodeSelector at g4dn-metal-numa - Gate the nfd-topology startup taint on the g4dn-metal-numa fleet - Add g4dn-metal-numa nodepool: single-numa-node, capped to ONE node (limits.nvidia.com/gpu: 8) so 1-GPU + 4-GPU runners pack one 2-NUMA box - Add a nodepool `limits` passthrough to generate_nodepools.py - Add 1-GPU and 4-GPU T4 runner defs (4-GPU uses scheduler_name: numa-scheduler) - Add cleanup-arc-staging.sh for teardown g4dn.metal = 2 sockets x 4 T4 (2 NUMA x 4 GPU), topologically identical to p4d/p5. cpuManagerPolicy=static and Guaranteed-QoS runner pods are already in place, so CPU+GPU NUMA alignment applies without workload changes. Cleanup: bash modules/nfd/scripts/cleanup-arc-staging.sh Then drop this commit: git checkout numa-aware-scheduling (or git reset --hard HEAD~1) ghstack-source-id: a9e35b6 Pull-Request: #748

[ghstack-poisoned]

Temporary test configuration — DO NOT MERGE. Parallel to the A100 [TEST-ONLY] commit: the same NUMA pipeline, validated on g4dn.metal (T4) because AWS does not offer A100/p4d in us-west-1 (arc-staging's region), whereas g4dn.metal is available on-demand there. - Add nfd + numa-scheduler to arc-staging modules; remove from prod clusters - Point NFD topology-updater + taint-remover nodeSelector at g4dn-metal-numa - Gate the nfd-topology startup taint on the g4dn-metal-numa fleet - Add g4dn-metal-numa nodepool: single-numa-node, capped to ONE node (limits.nvidia.com/gpu: 8) so 1-GPU + 4-GPU runners pack one 2-NUMA box - Add a nodepool `limits` passthrough to generate_nodepools.py - Add 1-GPU and 4-GPU T4 runner defs (4-GPU uses scheduler_name: numa-scheduler) - Add cleanup-arc-staging.sh for teardown g4dn.metal = 2 sockets x 4 T4 (2 NUMA x 4 GPU), topologically identical to p4d/p5. cpuManagerPolicy=static and Guaranteed-QoS runner pods are already in place, so CPU+GPU NUMA alignment applies without workload changes. Cleanup: bash modules/nfd/scripts/cleanup-arc-staging.sh Then drop this commit: git checkout numa-aware-scheduling (or git reset --hard HEAD~1) ghstack-source-id: 7af6767 Pull-Request: #748

[ghstack-poisoned]

Temporary test configuration — DO NOT MERGE. Parallel to the A100 [TEST-ONLY] commit: the same NUMA pipeline, validated on g4dn.metal (T4) because AWS does not offer A100/p4d in us-west-1 (arc-staging's region), whereas g4dn.metal is available on-demand there. - Add nfd + numa-scheduler to arc-staging modules; remove from prod clusters - Point NFD topology-updater + taint-remover nodeSelector at g4dn-metal-numa - Gate the nfd-topology startup taint on the g4dn-metal-numa fleet - Add g4dn-metal-numa nodepool: single-numa-node, capped to ONE node (limits.nvidia.com/gpu: 8) so 1-GPU + 4-GPU runners pack one 2-NUMA box - Add a nodepool `limits` passthrough to generate_nodepools.py - Add 1-GPU and 4-GPU T4 runner defs (4-GPU uses scheduler_name: numa-scheduler) - Add cleanup-arc-staging.sh for teardown g4dn.metal = 2 sockets x 4 T4 (2 NUMA x 4 GPU), topologically identical to p4d/p5. cpuManagerPolicy=static and Guaranteed-QoS runner pods are already in place, so CPU+GPU NUMA alignment applies without workload changes. Cleanup: bash modules/nfd/scripts/cleanup-arc-staging.sh Then drop this commit: git checkout numa-aware-scheduling (or git reset --hard HEAD~1) ghstack-source-id: 7fa073e Pull-Request: #748

[ghstack-poisoned]

Temporary test configuration — DO NOT MERGE. Parallel to the A100 [TEST-ONLY] commit: the same NUMA pipeline, validated on g4dn.metal (T4) because AWS does not offer A100/p4d in us-west-1 (arc-staging's region), whereas g4dn.metal is available on-demand there. - Add nfd + numa-scheduler to arc-staging modules; remove from prod clusters - Point NFD topology-updater + taint-remover nodeSelector at g4dn-metal-numa - Gate the nfd-topology startup taint on the g4dn-metal-numa fleet - Add g4dn-metal-numa nodepool: single-numa-node, capped to ONE node (limits.nvidia.com/gpu: 8) so 1-GPU + 4-GPU runners pack one 2-NUMA box - Add a nodepool `limits` passthrough to generate_nodepools.py - Add 1-GPU and 4-GPU T4 runner defs (4-GPU uses scheduler_name: numa-scheduler) - Add cleanup-arc-staging.sh for teardown g4dn.metal = 2 sockets x 4 T4 (2 NUMA x 4 GPU), topologically identical to p4d/p5. cpuManagerPolicy=static and Guaranteed-QoS runner pods are already in place, so CPU+GPU NUMA alignment applies without workload changes. Cleanup: bash modules/nfd/scripts/cleanup-arc-staging.sh Then drop this commit: git checkout numa-aware-scheduling (or git reset --hard HEAD~1) ghstack-source-id: c63197b Pull-Request: #748

Update

f42f04b

[ghstack-poisoned]

georgehong temporarily deployed to osdc-staging June 12, 2026 21:09 — with GitHub Actions Inactive

This was referenced Jun 12, 2026

Add NFD topology-updater module with startup taint remover (#696) #738

Draft

Add numa-scheduler module (#696) #739

Draft

Enable NUMA-aware scheduling for H100 4-GPU runner (#696) #740

Draft

georgehong temporarily deployed to osdc-staging June 12, 2026 21:10 — with GitHub Actions Inactive

georgehong temporarily deployed to osdc-staging June 12, 2026 21:11 — with GitHub Actions Inactive

Update

310de9a

[ghstack-poisoned]

georgehong mentioned this pull request Jun 15, 2026

arc-runners: support per-def workflow schedulerName (#696) #759

Draft

georgehong had a problem deploying to osdc-staging June 15, 2026 20:48 — with GitHub Actions Error

georgehong had a problem deploying to osdc-staging June 15, 2026 20:49 — with GitHub Actions Error

Update

5364abf

[ghstack-poisoned]

georgehong had a problem deploying to osdc-staging June 15, 2026 21:01 — with GitHub Actions Error

georgehong temporarily deployed to osdc-staging June 15, 2026 21:01 — with GitHub Actions Inactive

georgehong had a problem deploying to osdc-staging June 15, 2026 21:02 — with GitHub Actions Error

Update

a6bb182

[ghstack-poisoned]

georgehong had a problem deploying to osdc-staging June 15, 2026 22:19 — with GitHub Actions Error

georgehong temporarily deployed to osdc-staging June 15, 2026 22:20 — with GitHub Actions Inactive

Update

aa4473c

[ghstack-poisoned]

georgehong temporarily deployed to osdc-staging June 15, 2026 23:07 — with GitHub Actions Inactive

georgehong temporarily deployed to osdc-staging June 15, 2026 23:08 — with GitHub Actions Inactive

georgehong temporarily deployed to osdc-staging June 15, 2026 23:09 — with GitHub Actions Inactive

georgehong mentioned this pull request Jun 16, 2026

[TEST-ONLY] Validate NUMA scheduling on A100 (p4d) in ue1 staging #778

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST-ONLY] Enable NUMA modules on arc-staging with g4dn.metal (T4)#748

[TEST-ONLY] Enable NUMA modules on arc-staging with g4dn.metal (T4)#748
georgehong wants to merge 5 commits into
gh/georgehong/11/basefrom
gh/georgehong/11/head

georgehong commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

georgehong commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Capacity report

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tofu plan — arc-cbr-production

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tofu plan — arc-cbr-production-uw1

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tofu plan — meta-prod-aws-ue1

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

georgehong commented Jun 12, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading