Skip to content

fix(recipe): reject unapplied ComponentRef.Patches; fix coherence godoc#1589

Open
yuanchen8911 wants to merge 1 commit into
NVIDIA:mainfrom
yuanchen8911:fix/1588-followups
Open

fix(recipe): reject unapplied ComponentRef.Patches; fix coherence godoc#1589
yuanchen8911 wants to merge 1 commit into
NVIDIA:mainfrom
yuanchen8911:fix/1588-followups

Conversation

@yuanchen8911

@yuanchen8911 yuanchen8911 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Address the follow-ups surfaced by the cross-review of #1585: reject the inert ComponentRef.Patches field (fail loud instead of silently dropping requested patches), correct the coherenceProblem type-handling wording (localformat vs Flux), and tighten the OpenAPI componentRefs contract that the same review exposed.

Motivation / Context

  • ComponentRef.Patches is never applied. The field ("patch files for Kustomize") is carried through resolution — deep-copied, overlay-merged — but no deployer applies it (localformat.Component has no patches field; every generator omits it). So a recipe/overlay that sets patches: silently produces an unpatched bundle. No recipe uses it in-tree or in the internal nkx-recipes repo, so it is purely latent.
  • coherenceProblem godoc over-generalized how deployers treat Type. The field-classifying deployers (localformat + Helm/Helmfile/ArgoCD) key off tag/path; the Flux generator switches on the declared Type (Helm-only). So a Type: Helm ref carrying tag/path builds as Kustomize under the field-classifiers but as Helm under Flux — the same ref deploys differently.
  • OpenAPI componentRefs inaccuracies the review turned up: patches was undocumented, the top-level deploymentOrder (which the server serializes and POST /v1/bundle accepts) was never declared, and the description implied a per-ref order.

Fixes: #1588
Related: #1585

Note: stacked on #1585 (coherenceProblem is introduced there, not yet on main). Until #1585 merges the diff shows both; it will rebase to just these changes once #1585 lands.

Type of Change

  • Bug fix (turns a silent drop into a clear error)
  • Documentation update (godoc + OpenAPI contract + integrator/CLI docs)

Component(s) Affected

  • Recipe engine / data (pkg/recipe)
  • API contract (api/aicr/v1/server.yaml)
  • Docs (docs/integrator/recipe-development.md, docs/user/cli-reference.md)

Implementation Notes

  • coherenceProblem rejects any ref that declares patches (all types — no deployer applies them), before the type checks; ValidateCoherence still skips disabled refs.
  • ComponentRef.Patches godoc marked NOT-APPLIED (enabled refs rejected; disabled skipped). Implementing application or removing the field is the alternative, tracked in Follow-ups from #1585: unapplied ComponentRef.Patches, Flux+Kustomize gap, coherence godoc #1588.
  • Type-handling wording corrected in the godoc, the Helm-ref error (recommends removing tag/path or converting to a coherent Kustomize ref — not a bare "set type Kustomize", which a tag-only ref would still fail), and the case-insensitive / default-branch comments. Lowercase is documented as backward-compat input; the OpenAPI examples are canonical.
  • OpenAPI (api/aicr/v1/server.yaml): documented patches as unsupported/rejected; declared the top-level deploymentOrder (array<string>) property with a note to preserve it on a GET→POST round trip; corrected the componentRefs description (order is top-level, not per-ref); added deploymentOrder to the /v1/bundle example.
  • Docs: recipe-development.md notes patches is unsupported/rejected; cli-reference.md example gains chart for parity with the API examples.
  • Flux + Kustomize (item 2 of Follow-ups from #1585: unapplied ComponentRef.Patches, Flux+Kustomize gap, coherence godoc #1588): no code change needed — the Flux generator already fails closed on a non-Helm ref (flux.go default → "unsupported component type"); captured in the godoc.
  • Test hardening: the blocking test DataProvider now honors context cancellation (returns ctx.Err() instead of parking).

Testing

go test -race ./pkg/recipe/... ./pkg/client/...    # ok
golangci-lint run -c .golangci.yaml ./pkg/recipe/... # 0 issues
yamllint api/aicr/v1/server.yaml                    # ok
make check-docs-mdx                                 # ok

Adds coherence test cases (Helm+patches and Kustomize+patches both rejected).

Risk Assessment

  • Low — rejects a field nothing sets (0 uses across the public tree and internal nkx-recipes) and no deployer applies; turns a latent silent-drop into a clear ErrCodeInvalidRequest. Remaining changes are docs/OpenAPI accuracy + a test-only fix. Trivially reversible.

Rollout notes: if patches are ever needed, implement application (or drop the field) per #1588 rather than relying on the previous silent no-op.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

@yuanchen8911 yuanchen8911 added the theme/recipes Recipe expansion, overlays, mixins, and component registry label Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Recipe evidence check

No leaf overlays affected by this PR.

This gate is warning-only and never blocks merge.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR normalizes componentRefs to a Helm/Kustomize schema, adds coherence validation and type back-fill/canonicalization for RecipeResult, and wires that validation through recipe loading, bundling, client adopt handling, and bundle handling. Tests and docs were updated to match the new contract and rejection behavior.

Estimated code review effort: 4 (Complex) | ~75 minutes

Possibly related issues

Suggested reviewers: lalitadithya, lockwobr

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Several large recipe/client/bundler/API-schema changes appear to be stacked #1585 work and are unrelated to the #1588 follow-up scope. Split the #1585 normalization/validation work into a separate PR or rebase this PR to only the patches and coherence-godoc changes.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: rejecting ComponentRef.Patches and fixing coherence docs.
Linked Issues check ✅ Passed The PR rejects unsupported patches, adds coherence validation tests/docs, and keeps Flux's non-Helm rejection behavior aligned with #1588.
Description check ✅ Passed The description matches the code and docs changes, covering patches rejection, coherence wording, and OpenAPI contract updates.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/aicr/v1/server.yaml`:
- Around line 1464-1498: The componentRefs schema description still mentions
deployment order even though order was removed from componentRefs[]. Update the
description in the server.yaml schema block so it no longer implies ordering and
instead points readers to deploymentOrder for sequencing. Use the componentRefs
and deploymentOrder schema descriptions as the places to adjust the wording.
- Around line 1464-1497: The `RecipeResponse` schema in `server.yaml` documents
Helm/Kustomize fields but does not state that `patches` is unsupported, so
update the OpenAPI contract to explicitly reject or forbid `patches` on this
request body. Add the unsupported-field note in the `properties`/validation text
near the existing `type`, `tag`, `source`, and `path` descriptions so
`/v1/bundle` consumers know `patches` is not accepted and will fail loudly.

In `@docs/user/cli-reference.md`:
- Around line 493-498: The normalized Helm example in the CLI reference is
missing the chart field, so the example shape is incomplete. Update the
`componentRefs` Helm entry shown alongside `deploymentOrder` to include `chart`
as well as the existing `type`, `version`, and `source`, matching the normalized
Helm format used by the API examples and the `componentRefs` documentation.

In `@pkg/client/v1/aicr_internal_test.go`:
- Around line 1124-1133: The blocking provider in
blockingReadFileProvider.ReadFile can wait forever on readUnblock and currently
ignores context cancellation. Update ReadFile to watch ctx.Done() while waiting,
and if the context is canceled return ctx.Err() instead of continuing to block.
Keep the readStarted signaling behavior intact, but ensure the wait path in
ReadFile respects the DataProvider contract and exits promptly on cancellation.
- Around line 1055-1112: Refactor TestAdoptRecipe_RejectsIncoherentRef into a
table-driven test because it exercises three distinct adoptRecipe scenarios.
Keep the shared setup in TestAdoptRecipe_RejectsIncoherentRef and move the
incoherent Helm+tag rejection, the lowercase type canonicalization, and the
type-less registry back-fill checks into table cases. Use the same
client.adoptRecipe and a base helper, but make per-case expected error/result
assertions data-driven so adding new coherence cases stays consistent.

In `@pkg/recipe/componentref_coherence_test.go`:
- Around line 124-165: Refactor TestRecipeResultValidateCoherence and the other
multi-scenario coherence test into table-driven tests using a tests :=
[]struct{...} loop with subtests, so each scenario (coherent, incoherent,
disabled, nil receiver) is defined declaratively and checked per case. Keep the
existing assertions for expected error presence, ErrCodeInvalidRequest matching,
and required substrings, but move them into per-case fields and a shared loop;
use the existing RecipeResult.ValidateCoherence and ComponentRef symbols to
locate the logic.

In `@pkg/recipe/metadata.go`:
- Around line 98-101: Tighten the godoc for the Patches field in metadata.go so
it reflects current behavior: note that ValidateCoherence only skips disabled
refs, so the comment should say “enabled ref” instead of implying all refs are
checked. Also clarify the Flux behavior wording so it does not say Flux “never
sees” these refs; it should state that Flux may still receive coherent external
Kustomize refs and reject them by Type. Use the Patches and ValidateCoherence
symbols to update the surrounding comment blocks consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 3bdfc9aa-d4ec-40bd-9cce-b6dcc09ef464

📥 Commits

Reviewing files that changed from the base of the PR and between 55fd14f and 23c5f3f.

📒 Files selected for processing (11)
  • api/aicr/v1/server.yaml
  • docs/user/cli-reference.md
  • pkg/bundler/bundler.go
  • pkg/bundler/bundler_test.go
  • pkg/client/v1/aicr_internal_test.go
  • pkg/client/v1/bundle.go
  • pkg/recipe/componentref_coherence_test.go
  • pkg/recipe/loader.go
  • pkg/recipe/metadata.go
  • pkg/recipe/metadata_store.go
  • pkg/server/bundle_handler_test.go

Comment thread api/aicr/v1/server.yaml
Comment thread api/aicr/v1/server.yaml Outdated
Comment thread docs/user/cli-reference.md
Comment thread pkg/client/v1/aicr_internal_test.go
Comment thread pkg/client/v1/aicr_internal_test.go
Comment thread pkg/recipe/componentref_coherence_test.go
Comment thread pkg/recipe/metadata.go Outdated
@yuanchen8911 yuanchen8911 force-pushed the fix/1588-followups branch 5 times, most recently from 32bd86c to 0d75f25 Compare July 2, 2026 03:29
@yuanchen8911 yuanchen8911 marked this pull request as ready for review July 2, 2026 13:55
@yuanchen8911 yuanchen8911 requested a review from a team as a code owner July 2, 2026 13:55
@yuanchen8911 yuanchen8911 requested a review from mchmarny July 2, 2026 13:55
@yuanchen8911 yuanchen8911 enabled auto-merge (squash) July 2, 2026 13:57
@NVIDIA NVIDIA deleted a comment from github-actions Bot Jul 2, 2026

@mchmarny mchmarny left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, tightly-scoped follow-up to #1585. Verified against the head branch: the patches check fails closed first in coherenceProblem(), ValidateCoherence skips disabled refs as claimed, the new test cases cover both Helm+patches and Kustomize+patches, and no in-tree recipe sets patches so nothing regresses — the low-risk assessment holds. godoc/OpenAPI/doc accuracy improvements all check out.

One nit inline on the 400 message wording; nothing blocks the substance.

Two non-code items to note:

  • CI: Tier 1: eks-training (argocd-oci) failed at "Run KWOK test" while every other Tier 1 combo (including other argocd-oci recipes) passed. No recipe uses patches and the only logic change here is the patches rejection, so this isn't attributable to the diff — looks like a flake (the Actions jobs API was also 502-ing). Worth a re-run to confirm green.
  • Rebase: needs-rebase / stacked on #1585 — can't merge until that lands, as the PR notes.

Comment thread pkg/recipe/metadata.go Outdated
// no patches field). Fail closed on any type rather than drop it silently.
if len(ref.Patches) > 0 {
return fmt.Sprintf("component %q declares patches, but no deployer applies patch files; "+
"remove `patches` (it would be silently dropped). See #1588.", ref.Name)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the parenthetical (it would be silently dropped) reads as contradictory in a user-facing 400 — the whole point of this change is that the ref is now rejected, not dropped. Suggest stating why removal is safe instead: e.g. "remove \patches` (it is never applied by any deployer). See #1588."`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — the parenthetical predated the fail-closed behavior and read as contradictory. Reworded in 19da720 to state why removal is safe: "remove patches (removing it does not change the generated bundle). See #1588." Kept the leading clause ("no deployer applies patch files") as the reason, so the parenthetical now carries the safety statement rather than repeating it.

On the two non-code notes: the branch is rebased onto main with #1585 merged, and the force-push re-runs the full Tier 1 matrix, which covers the flaky eks-training (argocd-oci) KWOK job.

One heads-up from the local qualify run: make scan now fails on GHSA-fxhp-mv3v-67qp (oras.land/oras-go/v2 <= 2.6.1, published 2026-07-01, no patched release yet). Pre-existing on main — this PR touches no dependencies.

…oc (NVIDIA#1588)

Follow-ups surfaced by the cross-review of NVIDIA#1585.

ComponentRef.Patches is carried through resolution (deep-copied, overlay-
merged) but applied by NO deployer — localformat's Component has no patches
field and every generator omits it — so a recipe that declares `patches:`
silently produces an unpatched bundle. Reject any ref that declares patches
in coherenceProblem (fail loud) rather than dropping them silently, and mark
the field's godoc as unapplied. No recipe uses `patches` in-tree or in the
internal recipes repo, so this only turns a latent silent-drop into a clear
error; implementing patch application (or removing the field) is the
alternative, tracked in NVIDIA#1588.

Also correct the coherenceProblem godoc: it over-generalized that "the
deployers do not trust the declared Type." That holds for the
field-classifying deployers (localformat and the Helm/Helmfile/ArgoCD
generators built on it), but the Flux generator is Helm-only and switches on
Type (it already rejects a non-Helm ref with a clear error). Clarify this.

Fixes: NVIDIA#1588
Related: NVIDIA#1585

Document the rejected field for authors: add patches to the OpenAPI
componentRefs schema (marked unsupported/rejected) and a note in
docs/integrator/recipe-development.md, and clarify the Go field comment that
only ENABLED refs are rejected (disabled are skipped). Also finish the
type-handling wording: the godoc and the Helm-ref error no longer over-claim
that all deployers classify by tag/path — the field-classifying deployers do
(building a mismatched Helm ref as Kustomize), while the Flux deployer
switches on canonical Type and rejects a non-Helm ref outright.

Review nits: correct the componentRefs schema description (order is the
top-level deploymentOrder, not per-ref), add chart to the cli-reference Helm
example for parity with the API examples, and make the blocking test
DataProvider honor context cancellation.

Declare the top-level deploymentOrder property in the RecipeResponse schema
(array of component names) and include it in the /v1/bundle example, so a
GET->POST round trip through a generated client preserves deployment
sequencing instead of dropping it. Correct the remaining Flux wording: a
Helm ref carrying tag/path builds as Kustomize under the field-classifying
deployers but as Helm under Flux (which switches on the declared Type) — it
is not rejected by Flux; only an explicitly non-Helm type is. Fixed in the
godoc bullet, the Helm-ref error, and the case-insensitive/default-branch
comments.

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
@yuanchen8911 yuanchen8911 requested a review from mchmarny July 2, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api area/docs needs-rebase size/M theme/recipes Recipe expansion, overlays, mixins, and component registry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Follow-ups from #1585: unapplied ComponentRef.Patches, Flux+Kustomize gap, coherence godoc

2 participants