Skip to content

charts: set runAsUser/runAsGroup=65532 on container securityContext (6 components) (AROSLSRE-929)#5374

Open
raelga wants to merge 1 commit into
Azure:mainfrom
raelga:raelga/aroslsre-929-runasuser-hardening
Open

charts: set runAsUser/runAsGroup=65532 on container securityContext (6 components) (AROSLSRE-929)#5374
raelga wants to merge 1 commit into
Azure:mainfrom
raelga:raelga/aroslsre-929-runasuser-hardening

Conversation

@raelga
Copy link
Copy Markdown
Collaborator

@raelga raelga commented May 24, 2026

What

Add runAsUser: 65532 and runAsGroup: 65532 to the container securityContext of every ARO-HCP-owned chart that currently sets runAsNonRoot: true without an explicit user.

Why

On 2026-05-24, the kube-applier deployment failed for 72+ minutes on int-uksouth-mgmt-1 with CreateContainerConfigError: container has runAsNonRoot and image will run as root (fix shipped via #5373, AROSLSRE-926). Root cause was the same anti-pattern across the repo: chart says runAsNonRoot: true, no runAsUser, so kubelet validates against the image OCI config User field. When that field is empty on the registry (as it was for arohcpsvcint.azurecr.io/kube-applier@sha256:9378a76… — likely a buildkit / multi-stage cache artefact), the pod is rejected.

Audit (AROSLSRE-929 / closed AROSLSRE-927) identified 6 ARO-HCP-owned charts with the same anti-pattern. Each Dockerfile already sets USER 65532:65532 so the fix is purely additive in the chart and matches established behavior. Making the UID/GID explicit in the pod spec removes the dependency on registry-side metadata being preserved through future buildkit changes.

Components covered

Component Dockerfile USER File patched
admin 65532:65532 admin/deploy/templates/admin.deployment.yaml
backend 65532:65532 backend/deploy/templates/backend.deployment.yaml
frontend 65532:65532 frontend/deploy/templates/_helpers.tpl (define frontend.deployment)
mgmt-agent 65532:65532 mgmt-agent/deploy/templates/deployment.yaml
sessiongate 65532:65532 sessiongate/deploy/templates/deployment.yaml
tooling/aro-hcp-exporter 65532:65532 tooling/aro-hcp-exporter/deploy/templates/deployment.yaml

The diff is identical in every chart:

        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          runAsNonRoot: true
+         runAsUser: 65532
+         runAsGroup: 65532
          seccompProfile:
            type: RuntimeDefault

Out of scope

  • kube-applier — already fixed in kube-applier: set runAsUser/runAsGroup=65532 on container securityContext (AROSLSRE-926) #5373.
  • acrpull, observability/prometheus, route-monitor-operator — already declare runAsUser in their charts.
  • ACM-bundled upstream charts (4 files with the same anti-pattern) — should be raised against open-cluster-management upstream, not patched here.
  • frontend/deploy/templates/frontend.secret-refresher.yaml — busybox sidecar, no securityContext at all (separate hardening item).
  • image-sync/oc-mirror Dockerfile has no USER; only used in CI pipeline shell, not k8s-deployed.
  • Root-cause investigation of the OCI User field disappearing on registry push (buildkit / multi-stage cache). Chart-side fix here is the safest backstop regardless.

Validation

  • cd tooling/helmtest && UPDATE=true go test ./... — fixtures regenerated for all 6 components (and downstream frontend fixtures that include the helper template)
  • cd tooling/helmtest && go test ./... — green without UPDATE flag
  • Diff: 17 files, +40/-0 (6 chart templates + 11 fixture files)
  • Spot-checked rendered output (e.g. frontend/zz_fixture_TestHelmTemplate_dev_westus3_svc_1_aro_hcp_frontend_dev.yaml): both containers in the rendered Deployment now carry runAsUser: 65532 / runAsGroup: 65532

Refs

…6 components)

All six ARO-HCP-owned charts that declare runAsNonRoot: true without
runAsUser have been hit by the same potential failure mode as
kube-applier on 2026-05-24 (CreateContainerConfigError: container has
runAsNonRoot and image will run as root, when the image OCI config
User field is empty on the registry side).

Each Dockerfile already sets USER 65532:65532. Making runAsUser and
runAsGroup explicit in the chart removes the dependency on registry-side
image metadata being preserved through the buildkit / multi-stage
pipeline. Pod admission becomes deterministic regardless of build
artefacts.

Components updated:
- admin
- backend
- frontend (via deploy/templates/_helpers.tpl)
- mgmt-agent
- sessiongate
- tooling/aro-hcp-exporter

Fixtures regenerated via UPDATE=true go test in tooling/helmtest.

Refs https://issues.redhat.com/browse/AROSLSRE-929 (Story)
     https://issues.redhat.com/browse/AROSLSRE-930 (admin)
     https://issues.redhat.com/browse/AROSLSRE-931 (backend)
     https://issues.redhat.com/browse/AROSLSRE-932 (frontend)
     https://issues.redhat.com/browse/AROSLSRE-933 (mgmt-agent)
     https://issues.redhat.com/browse/AROSLSRE-934 (sessiongate)
     https://issues.redhat.com/browse/AROSLSRE-935 (aro-hcp-exporter)
Copilot AI review requested due to automatic review settings May 24, 2026 20:55
@openshift-ci openshift-ci Bot requested review from deads2k and geoberle May 24, 2026 20:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens ARO-HCP-owned Helm charts by making the non-root UID/GID explicit in container securityContext, preventing kubelet startup failures when an image’s OCI config User field is missing.

Changes:

  • Add runAsUser: 65532 and runAsGroup: 65532 to container securityContext in 6 charts that already had runAsNonRoot: true.
  • Regenerate Helm template fixtures/testdata to reflect the new rendered securityContext fields.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.

Show a summary per file
File Description
admin/deploy/templates/admin.deployment.yaml Set container runAsUser/runAsGroup to 65532 alongside runAsNonRoot.
admin/zz_fixture_TestHelmTemplate_dev_westus3_svc_1_admin_api.yaml Regenerated rendered fixture with explicit UID/GID.
admin/testdata/zz_fixture_TestHelmTemplate_admin_api_mise_enabled.yaml Regenerated rendered testdata fixture with explicit UID/GID.
backend/deploy/templates/backend.deployment.yaml Set container runAsUser/runAsGroup to 65532 alongside runAsNonRoot.
backend/zz_fixture_TestHelmTemplate_dev_westus3_svc_1_aro_hcp_backend_dev.yaml Regenerated rendered fixture with explicit UID/GID.
backend/testdata/zz_fixture_TestHelmTemplate_backend_mi_mock_and_arm_perms_mgr_identities_unset.yaml Regenerated rendered testdata fixture with explicit UID/GID.
backend/testdata/zz_fixture_TestHelmTemplate_backend_clstr_scoped_identities_role_set_name_public.yaml Regenerated rendered testdata fixture with explicit UID/GID.
frontend/deploy/templates/_helpers.tpl Add explicit UID/GID to the frontend deployment helper’s container securityContext.
frontend/zz_fixture_TestHelmTemplate_dev_westus3_svc_1_aro_hcp_frontend_dev.yaml Regenerated rendered fixture (both containers show explicit UID/GID).
frontend/testdata/zz_fixture_TestHelmTemplate_frontend_mise_enabled.yaml Regenerated rendered testdata fixture (both containers show explicit UID/GID).
frontend/testdata/zz_fixture_TestHelmTemplate_frontend_connect_socket.yaml Regenerated rendered testdata fixture (both containers show explicit UID/GID).
mgmt-agent/deploy/templates/deployment.yaml Set container runAsUser/runAsGroup to 65532 alongside runAsNonRoot.
mgmt-agent/zz_fixture_TestHelmTemplate_dev_westus3_mgmt_1_mgmt_agent.yaml Regenerated rendered fixture with explicit UID/GID.
sessiongate/deploy/templates/deployment.yaml Set container runAsUser/runAsGroup to 65532 alongside runAsNonRoot.
sessiongate/zz_fixture_TestHelmTemplate_dev_westus3_svc_1_sessiongate.yaml Regenerated rendered fixture with explicit UID/GID.
tooling/aro-hcp-exporter/deploy/templates/deployment.yaml Set container runAsUser/runAsGroup to 65532 alongside runAsNonRoot.
dev-infrastructure/zz_fixture_TestHelmTemplate_dev_westus3_svc_1_aro_hcp_exporter.yaml Regenerated rendered fixture with explicit UID/GID.

@sclarkso
Copy link
Copy Markdown
Collaborator

/test e2e-parallel

Copy link
Copy Markdown

@tuxerrante tuxerrante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean, well-scoped security hardening. A few confirmations and a minor follow-up note:

UID 65532 is correct. All 6 Dockerfiles consistently set USER 65532:65532 (the distroless nonroot convention). This is distinct from UID 65534 (nobody, the traditional POSIX system account) — both are valid non-root UIDs but 65532 is the de facto standard for application workloads in distroless/Chainguard base images. Chart values match.

Volume ownership — no concern. Checked all 6 deployments:

  • mgmt-agent, sessiongate, aro-hcp-exporter — no volumes mounted at all.
  • backend — CSI secret store (readOnly: true at volume level) + ConfigMap (readOnly: true on volumeMount). Both effectively read-only.
  • admin, frontend — only writable mount is mdsd-asa-run-vol (hostPath /var/run/mdsd), used for connecting to an existing Unix domain socket, not file creation. No fsGroup or ownership fixup needed.

Minor follow-up (not blocking): backend's backend-service-key-vault volumeMount is missing an explicit readOnly: true on the mount spec — the CSI volume definition enforces it at the driver level, but adding it to the volumeMount would be defense-in-depth. Pre-existing, not introduced by this PR.

Cross-repo note: ARO-RP has the same runAsNonRoot: true without runAsUser anti-pattern in 2 Gatekeeper static resource YAMLs (pkg/operator/controllers/guardrails/staticresources/gk_*_deployment.yaml). Lower risk since these track upstream Gatekeeper, but worth a preventive follow-up.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 26, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: raelga, tuxerrante

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tuxerrante
Copy link
Copy Markdown

Addendum to the cross-repo note in my review above: after deeper investigation, no fix is needed in ARO-RP. The 2 Gatekeeper static resource YAMLs with the same runAsNonRoot: true / no runAsUser pattern run on OpenShift customer clusters with restricted-v2 SCC. The SCC uses MustRunAsRange and automatically injects a runAsUser from the namespace's UID range at admission time — the kubelet never falls back to checking the OCI image User field. This makes those pods immune to the CreateContainerConfigError that affected ARO-HCP on AKS. Adding explicit runAsUser there would diverge further from upstream Gatekeeper manifests for no functional benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants