services: name required for new services, adopt by host for updates, heal legacy on read#165
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
💬 Discussion in Slack: #pr-review-agent-vault-165-services-auto-slug-missing-names-so-legacy-entries-hea Posted by Review Police — reviews, comments, new commits, and CI failures will stream into this channel. |
|
@claude review |
|
@claude review |
PR #164 advertised that legacy services without a Name field would be slug-backfilled lazily on read and that write-time callers could omit Name to have one derived from host+path. Neither path was implemented: loadServices returned legacy entries with Name="", and the UI's PUT-all-services edit flow then failed validation on the unedited siblings with "Invalid services: service N: name is required". Add broker.Slugify(host, path) and broker.AssignSlugNames(services, reserved), and call them at every ingress where an empty Name should heal: loadServices (read), handleServicesUpsert + handleServicesSet (write), and normalizeProposalServices for ActionSet entries. CLI flag-level --name requirement drops to match the server contract.
…ast-path - Hoist the suffix-and-truncate loop into broker.DisambiguateSlug so normalizeProposalServices reuses it instead of carrying a verbatim copy that had already drifted in trim style. - AssignSlugNames drops the `reserved` parameter — every caller passed nil. - AssignSlugNames short-circuits when every name is already set, so loadServices's per-read call is a single linear scan on healed vaults.
Companion to the auto-slug docstring alignment: tightens up the broker/brokercore/server seams that the auto-slug rollout exposed — broker.go pattern matching, credential validation in brokercore, and the discovery/services handlers — with the test coverage to match.
The alias has been deprecated with a one-time stderr warning since the rename to AGENT_VAULT_TOKEN. Drops the read fallback in cmd/helpers.go and the TS SDK, stops writing it on the child process from vault run and the container isolation env, prunes it from skills/docs, and removes the deprecation-path tests.
44421c8 to
2a4c7db
Compare
|
@claude review |
PR #165 review surfaced two reachability bugs and one race in the write-side auto-slug path. Revert to "name required for new services" and let an empty Name reference an existing service only when (Host, Path) uniquely matches one already in the vault — same pattern as ActionDelete-by-host. Legacy unnamed entries still heal on read. Adds adoptByHost(services, existing, rebindStale) used by both POST upsert (rebindStale=false) and proposal normalize (rebindStale=true). The proposal path's rebind closes the create→rename→apply race where the existing service was renamed between proposal create and apply.
…ensive Write paths now route through handle_services.adoptByHost, which rejects empty Names with no unique host match instead of auto-slugging. The remaining production caller of AssignSlugNamesAvoiding is the read-side heal (via AssignSlugNames, which passes nil existing), so the reservation branch is unreachable in practice. Docstring now says so; the test that exercises the ambiguous-host fallthrough now explains why broker.Validate doesn't already catch that shape.
| incomingSlice := splitInlineHosts(req.Services) | ||
| // rebindStale=false: direct admin upsert must not silently rewrite a | ||
| // caller-supplied Name onto a different existing entry. |
There was a problem hiding this comment.
🔴 Host-only auth rotation (the documented happy path POST {host, auth} with no name) silently destroys an existing service's Substitutions because the upsert merge at handle_services.go:404 does a full struct replacement, while the proposal-flow's MergeServices at merge.go:55-62 explicitly preserves Substitutions when omitted. A Twilio service with substitutions:[{key:TWILIO_ACCOUNT_SID, placeholder:account_sid, in:[path]}] loses its substitution on rotation, the literal account_sid passes through to Twilio, and every /Accounts/{SID}/* call returns 404. Fix is to mirror MergeServices's preserve-when-empty semantics in the upsert loop.
Extended reasoning...
What the bug is
The PR adds adoptByHost(incomingSlice, existing, false) at handle_services.go:386 so that a POST {services:[{host:"api.twilio.com", auth:NEW}]} (no name, no substitutions) adopts the existing service’s Name by unique (Host, Path) match — the documented happy path per skill_http.md:223 (“Optional for set when host uniquely matches existing — server adopts that entrys name”) and docs/learn/services.mdx. The merge a few lines later at handle_services.go:404 is existing[idx] = svc— a full struct replacement. Because the JSON omittedsubstitutions`, the replaced entry’s Substitutions field is nil. The literal placeholder string is no longer rewritten on the wire.
The proposal-apply path explicitly handles the same scenario differently. In MergeServices (merge.go:55-62):
case exists:
next := toBrokerService(p)
// Empty Substitutions means "leave existing alone";
// callers clear by delete+recreate.
if len(p.Substitutions) == 0 {
next.Substitutions = merged[idx].Substitutions
}
merged[idx] = nextSo proposal-set preserves substitutions on omit; direct upsert does not. The PR opens up the host-only-no-name surface but does not bring the same preservation across.
Why existing safeguards do not catch it
broker.Validateonly checks structural and intra-slice properties — it has noexistingcontext.- The new
TestServicesUpsertEmptyNameAdoptsExistingByHostPathseeds a service with no substitutions and no explicit Enabled, so neither preservation case is exercised. TestServicesUpsertReplaceExisting(pre-existing) tests explicit-name replace without seeded substitutions.
Addressing the “pre-existing / intentional design” objection
The refutation correctly notes the merge loop pre-existed and that cmd/service.go documents the CLI as “upsert by name … it is replaced”. Two reasons this is still worth flagging:
-
The PR newly opens the happy path that reaches the destructive merge. Pre-PR, an admin had to know and supply the canonical name to upsert; the rotation footgun was a consequence of an explicit replace-by-name operation. Post-PR, an admin POSTs
{host, auth}with no name as the documented rotation flow, the server resolves Name on their behalf viaadoptByHost, and the same full-replace silently destroys substitutions. The reachability via natural agent flow is what changed. -
The asymmetry with the proposal path is documented as a contract.
merge.go:57-58comments “Empty Substitutions means leave existing alone; callers clear by delete+recreate”, anddocs/learn/services.mdxcalls out “Auth-only or enable/disable-only set proposals preserve existing substitutions”. The proposal flow does not advertise general partial-update semantics — it specifically preserves Substitutions. The upsert handler, now reached by the same natural flow, breaks that contract on the substitution surface that exists precisely for in-place rotation.
I am setting this as normal rather than pre_existing because the substitution asymmetry is the new triggering surface. The enabled-half of the original framing is less clear-cut — MergeServices also drops Enabled=nil on {action:"set", host, auth:NEW} proposals — so I am scoping this comment to the substitution case.
Step-by-step proof (substitution case)
- Vault state:
[{name:"twilio", host:"api.twilio.com", auth:{type:"basic", username:"OLD_USER", password:"OLD_PASS"}, substitutions:[{key:"TWILIO_ACCOUNT_SID", placeholder:"__account_sid__", in:["path"]}]}]. - Admin rotates auth via the documented happy path:
POST /v1/vaults/default/services {"services":[{"host":"api.twilio.com","auth":{"type":"basic","username":"NEW_USER","password":"NEW_PASS"}}]}. Noname, nosubstitutions. handleServicesUpsertat line 386 callsadoptByHost(incomingSlice, existing, false).incoming[0].Name=="",hpCount[{api.twilio.com,""}]==1, adoptsName="twilio".broker.Validatepasses (slug valid, host valid, basic auth valid).byName["twilio"]==0, so the upsert loop fires at line 404:existing[0] = incomingSlice[0]. The incoming entry hasSubstitutions=nil(omitted from JSON), soexisting[0].Substitutionsbecomes nil. Auth is rotated; substitution is destroyed.- Persisted JSON:
[{name:"twilio", host:"api.twilio.com", auth:NEW}]— no substitutions. - Response:
{upserted:["twilio"], services_count:1}— no warning. - Next proxy request to
POST https://api.twilio.com/2010-04-01/Accounts/__account_sid__/Messages.json— the literal__account_sid__is no longer rewritten and passes through to Twilio’s API → 404.
Fix
Mirror MergeServices’s preserve-when-empty semantics in handleServicesUpsert immediately after adoptByHost:
if idx, ok := byName[svc.Name]; ok {
if len(svc.Substitutions) == 0 {
svc.Substitutions = existing[idx].Substitutions
}
existing[idx] = svc
}A regression test should seed a service with substitutions, POST a host-only rotation with no substitutions, and assert the substitutions survive in the stored services_json.
| for i := range services { | ||
| svc := &services[i] | ||
| if svc.Name != "" { | ||
| if !rebindStale || nameSet[svc.Name] { | ||
| continue | ||
| } | ||
| } | ||
| k := hp{svc.Host, svc.Path} | ||
| if hpCount[k] == 1 { | ||
| svc.Name = hpName[k] | ||
| } | ||
| } |
There was a problem hiding this comment.
🔴 The proposal-flow adoptByHost call (handle_services.go:182) uses rebindStale=true, but the loop at lines 117-128 cannot distinguish a stale auto-adopted Name from a deliberate explicit Name — both have a non-empty Name that's not in the existing nameSet. When a user POSTs a proposal with a deliberate distinct name (e.g. stripe-staging) targeting a host that already has a service under a different name (stripe-prod at api.stripe.com), the user's name is silently rewritten to stripe-prod, and MergeServices then in-place replaces it — rotating production auth against the user's actual intent. The direct admin upsert path explicitly passes rebindStale=false to guard against this exact pattern (per the comment at handle_services.go:385-387); the proposal path takes the dangerous side of the same trade-off. Fix: tag per-entry auto-slug provenance so only auto-adopted Names rebind, or return 409 when a rebind would change a user-supplied Name.
Extended reasoning...
What the bug is
normalizeProposalServices at handle_services.go:182 calls adoptByHost(view, existing, true). The relevant branch at lines 117-128 reads:
if svc.Name != "" {
if !rebindStale || nameSet[svc.Name] {
continue
}
}
k := hp{svc.Host, svc.Path}
if hpCount[k] == 1 {
svc.Name = hpName[k]
}For non-empty svc.Name with rebindStale=true and nameSet[svc.Name]==false (i.e. the user-supplied Name does not match any existing service), control falls through and the Name is rebound to whichever existing service uniquely owns the (Host, Path). The intent of the rebind was to close a create→rename→apply race where a previously auto-adopted Name became stale — but the implementation has no way to distinguish stale-auto-slug from deliberate-distinct-name, so it rewrites both.
Step-by-step proof
- Vault state:
[{Name:"stripe-prod", Host:"api.stripe.com", Auth:OLD}]. - User POSTs
/v1/proposalswith{services:[{action:"set", name:"stripe-staging", host:"api.stripe.com", auth:NEW}]}— intent: add a parallel staging service at the same host. normalizeProposalServicesentersadoptByHost.nameSet={"stripe-prod"},hpCount[{api.stripe.com,""}]=1,hpName="stripe-prod".view[0].Name="stripe-staging"is non-empty,rebindStale=true,nameSet["stripe-staging"]==false→ continue gate is(false || false) = false, falls through.hpCount[k]==1→view[0].Name="stripe-prod"(silently rewritten).- Proposal persisted with
services_jsoncontainingname="stripe-prod". - Admin approves.
MergeServices(merge.go:50-66):nameIndex={"stripe-prod":0}, hits in-place replace branch →merged[0] = toBrokerService(p)overwrites the entire entry including Auth. - Result:
stripe-prodAuth rotated to NEW. The user'sstripe-stagingservice is never created. The create response (handle_proposals.go:233-239) reports success with no rename signal.
Why existing safeguards don't catch it
proposal.Validateonly checks intra-proposal duplicate Names (validate.go:60-65) — has no view of existing.broker.Validateonly checks duplicate Names within the merged slice, never Hosts.- The PR's new tests cover empty-Name adopts (
TestProposalCreateActionSetEmptyNameAdoptsExistingByHostPath) and stale-auto-slug rebinds where the rename was legitimate (TestAdminProposalApplyRebindsStaleNameByHost). No test pins the deliberate-distinct-name case.
The asymmetry is documented in the source: handleServicesUpsert deliberately passes rebindStale=false with a comment that "direct admin upsert must not silently rewrite a caller-supplied Name onto a different existing entry" (handle_services.go:385-387). The proposal flow takes the opposite side of the same trade-off.
Addressing the refutation
The refutation points out (correctly) that two services with identical (Host, Path) cannot both be matched by MatchService — the second would be unreachable on declaration-order tiebreak. That observation is true but addresses a different question. The bug is not "the user's stripe-staging would have worked at runtime." The bug is that the user's intent — add a new service — is silently transformed into a different operation — rotate the credentials on an existing service — without any error or warning. The refutation also notes the rewritten Name is visible via GET /v1/proposals/{id}; that is true but the create-response payload (which is what an automated agent inspects synchronously) carries only {id, status, vault, approval_url, message} with no rename signal, so the divergence is only discoverable by a separate proactive fetch.
The refutation also frames the asymmetry with handleServicesUpsert as intentional code reuse. The author did recognize the danger — the explicit rebindStale=false comment proves it — but the proposal path takes the side of that trade-off that the direct path explicitly refused, and the failure mode (silent production credential rotation against user intent) is materially worse than the alternative (a "new-service vs unique-host" 409 telling the caller their host is already taken).
Fix
Per the bug description, three reasonable shapes:
- Per-entry provenance: tag entries at create time as "auto-adopted" vs "explicitly named", and only rebind the former at apply.
- 409 on rewrite: when a rebind would change a user-supplied Name, return a structured 409 with the existing service's Name as a candidate, mirroring the host-ambiguity 409 already in place for unnamed deletes.
- Drop the create-time rebind: only rebind at apply time, and only when the proposal's Name is not in
existingAND the proposal's Name is not in any new entries — narrows the window to the actual race.
| ```bash | ||
| # Add a service (non-destructive upsert by name; --name and --host are both required) | ||
| # Add a service (non-destructive upsert by name; --host is required, --name optional — | ||
| # when omitted the server slugifies host+path, e.g. api.stripe.com → api-stripe-com) |
There was a problem hiding this comment.
🟡 Two stale lines in cmd/skill_cli.md still describe the auto-slug-on-write behavior that commit 8d0f84e explicitly reverted ("require explicit name on writes, adopt by host for updates"). Line 99 (the "Add a service" comment in the bash example) and line 193 (the proposal-create flag-driven mode header) both promise that the server "slugifies host+path" when --name is omitted — but the server now returns 400 service N: name is required whenever no existing service uniquely matches the host. Line 262 of the same file already states the correct contract, so the file contradicts itself. The fix is a two-site prose update mirroring line 262 — no runtime impact, nit.
Extended reasoning...
What the bug is
The PR description advertises auto-slug-on-write ("services without name are slugged via broker.Slugify(host, path) at write time") but the strategy was reverted in commit 8d0f84e (the current HEAD) to "require explicit name on writes, adopt by host for updates". The follow-up docs-alignment commits (8ad4ac9, 8d0f84e) cleaned up most paired sites — including cmd/skill_cli.md:262, which now correctly reads:
services[].name-- ... Required for"set"when creating a new service — pick a deliberate name. May be omitted only whenhost+pathuniquely matches an existing service in the vault: the server adopts that entry's name, the same pattern as"delete"by host.
But two siblings in the same file still describe the removed auto-slug-on-write behavior:
-
Line 98–99 (inside the "Managing Services Directly" bash code block):
# Add a service (non-destructive upsert by name; --host is required, --name optional — # when omitted the server slugifies host+path, e.g. api.stripe.com → api-stripe-com) -
Line 193 (proposal-create flag-driven mode header):
Flag-driven mode (common cases). When
--hostis provided,--nameis optional — when omitted, the server slugifieshost+path(e.g.api.stripe.com→api-stripe-com):
Why the current code does not catch this
Pure prose drift — no test covers help/skill text. The paired sibling docs (cmd/skill_http.md, docs/reference/cli.mdx, docs/learn/services.mdx, the cobra flag descriptions in cmd/service.go and cmd/proposal_create.go) were all updated by the docs-alignment commits. cmd/skill_cli.md is the only file where the two sites at lines 99/193 drifted — cmd/skill_cli.md:262 was updated, but lines 99 and 193 were not.
Impact
The skill file is the canonical document the agent reads to learn how to drive the CLI. An agent reading lines 99/193 will attempt the documented happy path against a fresh host:
agent-vault vault proposal create --host api.stripe.com --auth-type bearer --token-key STRIPE_KEY
(no --name). The CLI builds a proposal with name="". normalizeProposalServices (handle_services.go) calls adoptByHost(rebindStale=true) — finds no host match — leaves Name empty. proposal.Validate then rejects with 400 service 0: name is required. The agent has no recovery path because the file just told it --name is optional. The same agent that reads down to line 262 finds the correct contract — the contradiction is internal to one file.
This is the same paired-docstring drift the recent docs-alignment commits target. Severity is nit (no runtime impact, documentation-only) but it directly negates the work done in 8d0f84e.
Step-by-step proof
- Read
cmd/skill_cli.md:99:# when omitted the server slugifies host+path, e.g. api.stripe.com → api-stripe-com. - Read
cmd/skill_cli.md:193:**Flag-driven mode ... When --host is provided, --name is optional — when omitted, the server slugifies host+path ...**. - Read
cmd/skill_cli.md:262:services[].name ... Required for "set" when creating a new service ... May be omitted only when host + path uniquely matches an existing service. - Same file, three locations, two of them contradict the third.
- Confirm runtime:
internal/broker/broker.go:ValidateenforcesName != ""(pinned byTestValidateConfigRejectsMissingName).internal/server/handle_services.go:handleServicesUpsertcallsadoptByHost(incoming, existing, rebindStale=false)thenbroker.Validate— empty Name with no host match → 400 (pinned byTestServicesUpsertRejectsMissingNameForNewServiceandTestProposalCreateActionSetRejectsMissingNameForNewService). - Confirm scope:
grep -n slugif cmd/skill_cli.mdreturns exactly lines 99 and 193 plus the unambiguous legacy-on-read mention.grep -n slugif cmd/skill_http.mdreturns nothing — paired file was updated correctly.
How to fix
Mirror the wording the PR already applied at line 262 to the two stale sites:
- Line 98–99 →
# Add a service (non-destructive upsert by name; --host and --name are both required for new services; --name may be omitted only when --host uniquely matches an existing service — server adopts that name). - Line 193 →
**Flag-driven mode (common cases). When --host is provided, --name is required for new services; may be omitted only when --host uniquely matches an existing service (server adopts that name):**.
Same one-line edit pattern the docs-alignment commits applied elsewhere.
Summary
Fixes the "Invalid services: service N: name is required" error users hit in the Edit Service sidebar after PR #164. The vault held two-plus services persisted before
namebecame required; opening any one for edit and clicking Save failed validation on the unedited siblings.PR #164's commit message advertised auto-naming at write time + lazy heal on read, but neither path was implemented (
broker.Slugifydid not exist;loadServicesonly split inline hosts). After review iteration, this PR lands a tighter contract than the original auto-everywhere design:loadServices,/discover,credential.Injectauto-slugify emptyNamefromhost+pathper read.nameis required. Emptynamewith no host match → 400 "name is required".namewhose(host, path)uniquely matches an existing service adopts that entry's name.name(e.g. existing service was renamed between create and apply) rebinds by(host, path).This unifies set/upsert with the existing
ActionDelete-by-host pattern: agents that know the host but not the canonical name can still target an existing service, but the server never silently invents a name for a brand-new service.Implementation
Broker helpers (
internal/broker/broker.go)Slugify(host, path)— deterministic,ValidateSlug-conformant id derivation. Strips wildcards, normalizes non-[a-z0-9]chars to-, collapses, trims, truncates to 64, pads<3chars.AssignSlugNames(services)— read-path heal. Fills emptyNameviaSlugify+DisambiguateSlug(intra-slice disambiguation only).DisambiguateSlug(base, taken)— lowestbase-<n>(n≥2) not in taken, length-bounded.AssignSlugNamesAvoiding(services, existing)— kept for existing-state-aware heal; reservation branch is now defensive (write paths useadoptByHostinstead, see below).Read-side heal — three ingress points
loadServices(handle_services.go) — heals on every read; the slug becomes durable on the next write.handleDiscover(handle_discovery.go) — agents identify services bynameper skill docs; blankNamewould make a service un-addressable from/discover.StoreCredentialProvider.Inject(brokercore/credential.go) —MatchedNamelands in the request log +X-Vault-Serviceheader; the documented?service=<name>log filter depends on this.Write-side adopt-by-host — server handlers (
internal/server/handle_services.go)adoptByHost(services, existing, rebindStale)— single helper used by both writes and proposals. EmptyNameadopts a unique(Host, Path)match in existing. WhenrebindStale=true, non-emptyNamethat misses existing's nameset is also rebound — proposals passtrueto close the create→rename→apply race; direct admin upserts passfalseso a caller-supplied name is never silently rewritten.handleServicesUpsert(POST) →adoptByHost(..., false).handleServicesSet(PUT) → no adopt; PUT is replace-all, callers round-trip names from GET.normalizeProposalServices→adoptByHost(..., true)forActionSetentries with non-emptyHost. Called at both proposal create AND proposal apply, so the rebind heals the rename race at apply time.CLI + docs
cmd/service.go,cmd/proposal_create.go— flag-help reflects the new contract: "Required for new services; may be omitted when--hostuniquely matches an existing service."cmd/skill_http.md,cmd/skill_cli.md,docs/learn/services.mdx,docs/reference/cli.mdx,AGENTS.md— same wording across every agent-facing surface.Unrelated thread bundled here: removal of the deprecated
AGENT_VAULT_SESSION_TOKENenv-var alias from server, CLI, SDK, and docs (commit2a4c7db). Self-contained; the env var was already documented as deprecated.Test plan
go test ./...green.Slugify/AssignSlugNames/DisambiguateSlug/AssignSlugNamesAvoidingunit tests ininternal/broker/broker_test.go.TestServicesUpsertRejectsMissingNameForNewService,TestServicesUpsertEmptyNameDoesNotReplaceUnrelatedExisting,TestProposalCreateActionSetRejectsMissingNameForNewService,TestProposalCreateActionSetEmptyNameUnrelatedHostRejects— all assert 400 withname is requiredinstead of silent auto-slug-and-overwrite.TestServicesUpsertEmptyNameAdoptsExistingByHostPath,TestProposalCreateActionSetEmptyNameAdoptsExistingByHostPath— emptyName+ matching host → adopts the existing entry's name in place.TestAdminProposalApplyRebindsStaleNameByHost— admin renamesapi-stripe-com→stripe-prodbetween proposal create and approve; apply rebinds and updatesstripe-prodinstead of appending a ghost.TestDiscoverHealsLegacyUnnamedServices,TestInject_HealsLegacyUnnamedServiceMatchedName,TestLegacyUnnamedServicesGetSetRoundTrip— pins heal on/discover, proxy hot path, and GET→modify-one→PUT.Closes the immediate user-reported bug from the post-PR-#164 release.