Skip to content

feat(helm): add Flow (RLA) helm chart and prereqs wiring#1932

Open
shayan1995 wants to merge 1 commit into
NVIDIA:mainfrom
shayan1995:feat/flow-helm-chart
Open

feat(helm): add Flow (RLA) helm chart and prereqs wiring#1932
shayan1995 wants to merge 1 commit into
NVIDIA:mainfrom
shayan1995:feat/flow-helm-chart

Conversation

@shayan1995
Copy link
Copy Markdown
Contributor

Description

Adds a standalone helm-flow/ chart deploying the Flow rack lifecycle orchestrator (formerly RLA) as a single pod with three gRPC containers - flow (50051), psm (50052), nsm (50053). Mirrors the upstream forged deployment: containers share a SPIFFE cert and talk over headless Services that DNS-resolve to the pod IP.

Wires per-component dependencies into helm-prereqs so ./setup.sh brings flow up end-to-end:

  • postgresql.yaml: provisions flow/psm/nsm databases and roles on nico-pg-cluster
  • eso-external-secrets.yaml: ClusterExternalSecrets sync the per-service DB credentials into the flow namespace
  • flow-vault-tokens-job.yaml (new): post-install hook mints scoped Vault tokens for psm/nsm and writes them as Secrets in the flow ns
  • values.yaml: new flow.enabled / flow.namespace toggles; flips vault.nicoApiK8sAuth.enabled=true (carbide-api requires the role)

setup.sh phase 7i deploys the chart with a pre-apply Certificate dance to avoid cert-manager / FailedMount races, waits for vault tokens and ESO DB-cred syncs, then helm upgrade --installs flow. clean.sh and health-check.sh updated to cover the new namespace and resources.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

helm-flow/ lives at the repo root (not under helm/charts/) so the nico umbrella does not auto-discover it as a subchart. The Namespace template carries helm.sh/resource-policy: keep so uninstalling flow does not wipe the prereqs-managed secrets that live in the namespace.

@shayan1995 shayan1995 requested review from a team as code owners May 26, 2026 20:15
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shayan1995 shayan1995 force-pushed the feat/flow-helm-chart branch from 9d44e33 to 32afe1b Compare May 26, 2026 20:24
@ajf
Copy link
Copy Markdown
Collaborator

ajf commented May 26, 2026

Shouldn't this go in helm/charts/nico-flow? Why the top-level?

@shayan1995 shayan1995 requested a review from jingjingl1 May 26, 2026 21:34
@shayan1995 shayan1995 force-pushed the feat/flow-helm-chart branch from 32afe1b to 77235e2 Compare May 27, 2026 06:04
Comment thread helm/charts/nico-flow/templates/_helpers.tpl Outdated
Comment thread helm/charts/nico-flow/values.yaml
Comment thread helm/charts/nico-flow/values.yaml Outdated
Comment thread helm/charts/nico-flow/templates/deployment.yaml
Comment thread helm/charts/nico-flow/values.yaml
Comment thread helm-prereqs/values.yaml
Comment thread helm-prereqs/setup.sh
Comment thread helm-prereqs/setup.sh
Comment thread helm-prereqs/templates/flow-vault-tokens-job.yaml Outdated
Comment thread helm-prereqs/setup.sh Outdated
@shayan1995 shayan1995 force-pushed the feat/flow-helm-chart branch 2 times, most recently from 5e4f945 to 3ce5e29 Compare May 27, 2026 23:31
Copy link
Copy Markdown
Contributor

@kunzhao-nv kunzhao-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Adds the Flow rack lifecycle orchestrator (formerly RLA) helm chart at
helm/charts/nico-flow/, alongside the rest of the NICo subcharts. Flow
runs as a single pod with three gRPC containers — flow (50051), psm
(50052), nsm (50053) — sharing a SPIFFE cert and communicating over
headless Services that DNS-resolve to the pod IP. This mirrors the
upstream forged deployment.

Flow ships as a STANDALONE Helm release (release name "flow", namespace
"flow"), NOT as part of `helm install nico ./helm`. The nico umbrella
declares it as a conditional dependency with nico-flow.enabled defaulted
to false in helm/values.yaml — this keeps the chart in its conventional
helm/charts/ location while preventing Helm v3+ from auto-rendering it
into the nico release (where it would conflict with nico-prereqs over
the nico-system namespace).

Wires per-component dependencies into helm-prereqs so ./setup.sh brings
flow up end-to-end:
  - postgresql.yaml: provisions flow/psm/nsm databases and roles on
    nico-pg-cluster
  - eso-external-secrets.yaml: ClusterExternalSecrets sync the
    per-service DB credentials into the flow namespace
  - flow-vault-tokens-job.yaml (new): post-install hook mints scoped
    Vault tokens for psm/nsm and writes them as Secrets in the flow ns
  - values.yaml: new flow.enabled / flow.namespace toggles; flips
    vault.nicoApiK8sAuth.enabled=true (carbide-api requires the role)

setup.sh phase 7i deploys the chart with a pre-apply Certificate dance
to avoid cert-manager / FailedMount races, waits for vault tokens and
ESO DB-cred syncs, then helm upgrade --installs flow. clean.sh and
health-check.sh updated to cover the new namespace and resources.

The nico-flow Namespace template carries helm.sh/resource-policy: keep
so uninstalling flow does not wipe the prereqs-managed secrets that
live in the namespace.
@shayan1995 shayan1995 force-pushed the feat/flow-helm-chart branch from 3ce5e29 to a0c2adf Compare May 28, 2026 06:23
@shayan1995 shayan1995 enabled auto-merge (squash) May 28, 2026 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants