Skip to content

[Bug]: --deployer flux drops PreManifestFiles, so GKE ResourceQuota fix (#921) does not reach flux bundles #923

Description

@yuanchen8911

Summary

PR #921 fixed issue #915 (GKE ResourceQuota admission blocks system-*-critical pods outside kube-system) for the helm, helmfile, argocd, and argocd-helm deployers by synthesizing a pre-phase ResourceQuota via collectComponentPreManifests. The PR description claims flux benefits too, but the flux deployer never consumes ComponentPreManifests — so a GKE bundle generated with --deployer flux still hits the original symptom from #915.

This is also a pre-existing gap from #856 (feat: os-talos mixin + bundler preManifestFiles support), which added PreManifestFiles to the recipe schema and wired four of the five deployers. The flux deployer was missed at that point and was never caught because the only production user of PreManifestFiles at the time was os-talos, which --deployer flux users don't exercise.

Affected component

pkg/bundler/deployer/fluxflux.Generator has no ComponentPreManifests field, and bundler.buildDeployer's DeployerFlux case only calls collectComponentManifests (post), not collectComponentPreManifests. The synthesized GKE quota is computed but silently dropped before reaching the flux generator.

Repro

  1. Generate a GKE recipe (e.g. via cuj1-gke-config.md flow).
  2. aicr bundle --recipe recipe.yaml --deployer flux --output ./bundle.
  3. Apply the bundle to a fresh GKE Standard cluster.
  4. gpu-operator HelmRelease stays unready; kubectl -n gpu-operator describe deploy/gpu-operator shows the same FailedCreate: insufficient quota to match these scopes: [{PriorityClass In [system-node-critical system-cluster-critical]}] from [Bug]: helmfile/helm bundles fail on GKE — system-*-critical priorityClass pods rejected by ResourceQuota admission #915.

Verify the bundle omits the synthesized quota:

```bash
find ./bundle -name 'gke-critical-pods-quota.yaml' # empty
ls ./bundle/gpu-operator-pre 2>/dev/null # missing
```

Code references

  • pkg/bundler/bundler.go:388-409DeployerFlux switch case calls only collectComponentManifests.
  • pkg/bundler/deployer/flux/flux.go:99-150flux.Generator struct: ComponentManifests only, no ComponentPreManifests field.
  • Contrast with helm.Generator (pkg/bundler/deployer/helm/helm.go:80), helmfile.Generator (pkg/bundler/deployer/helmfile/helmfile.go:75), argocd.Generator (pkg/bundler/deployer/argocd/argocd.go:138), argocdhelm.Generator (pkg/bundler/deployer/argocdhelm/argocdhelm.go:123) — all four have the field and forward to localformat.Options.ComponentPreManifests.

Documentation gap

pkg/bundler/deployer/flux/doc.go describes the generated bundle structure but only shows <name>-post/ for mixed components (line 84). The <name>-pre/ wrapper (which other deployers emit when PreManifestFiles is non-empty) is absent from the example tree because the feature isn't implemented. This package-level doc should grow a <name>-pre/ entry once the fix lands, and the "Deployment Ordering" section needs a sentence on how pre-folders thread into the `dependsOn` chain (`previous → -pre → → -post → next`).

Fix outline

  1. Add ComponentPreManifests map[string]map[string][]byte to flux.Generator.
  2. Wire b.collectComponentPreManifests(...) in bundler.buildDeployer's DeployerFlux case alongside the existing post-manifest collection.
  3. In generateComponentResources, when g.ComponentPreManifests[ref.Name] is non-empty, emit a <name>-pre HelmRelease via generateManifestHelmChart (reusing the same helper as -post), and rewire dependsOn so the primary depends on <name>-pre instead of the previous component. Chain becomes: previous → <name>-pre → <name> → <name>-post → next.
  4. Add a <name>-pre collision guard mirroring the rule in pkg/bundler/deployer/localformat/writer.go:258-263.
  5. Update pkg/bundler/deployer/flux/doc.go: add <name>-pre/ to the generated structure example and document the dependsOn chain ordering.
  6. Tests in flux_test.go: pre-only component, pre+post on same component, pre on index-0 component (head of chain), <name>-pre collision rejection. End-to-end: rebuild GKE bundle with --deployer flux and confirm gpu-operator-pre/helmrelease.yaml is emitted.

Severity

Medium for GKE+flux users (full regression of #915 symptom), low overall because --deployer flux on GKE H100 isn't the current demo path. Bundle is well-formed but silently incomplete — no error surfaces at generation time, only at reconcile time on the cluster.

Related

Metadata

Metadata

Assignees

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions