Skip to content

fix(pt): prepend CLUSTER_NAME for restart/reinit/switchover/failover (fixes #29)#30

Open
CrzMarvin wants to merge 1 commit into
pgsty:mainfrom
CrzMarvin:fix/patronictl-cluster-name-prepend
Open

fix(pt): prepend CLUSTER_NAME for restart/reinit/switchover/failover (fixes #29)#30
CrzMarvin wants to merge 1 commit into
pgsty:mainfrom
CrzMarvin:fix/patronictl-cluster-name-prepend

Conversation

@CrzMarvin
Copy link
Copy Markdown

Summary

Fixes #29pig pt restart, pig pt reinit, pig pt switchover, and pig pt failover all fail because the underlying patronictl invocation omits the required CLUSTER_NAME positional argument.

Unlike patronictl pause/resume/list, those four subcommands do not accept -c <config> as a substitute for scope — they require CLUSTER_NAME as the first positional argument.

Reproduction (from #29)

$ sudo -u dba pig pt restart --pending -f
$ /usr/bin/patronictl -c /etc/patroni/patroni.yml restart --force --pending
Error: Missing argument 'CLUSTER_NAME'.

$ sudo -u dba pig pt restart pg-nms-1 -f
# patronictl gets `restart --force pg-nms-1` → treats pg-nms-1 as CLUSTER_NAME
# → etcd PermissionDenied (no such cluster key)

$ sudo -u dba pig pt restart pg-nms pg-nms-1 -f
# pig only consumes args[0] = "pg-nms" → patronictl runs `restart pg-nms --force`
# → restarts WHOLE cluster instead of just pg-nms-1 (surprising and undocumented)

Fix shape

  • New exported GetClusterName(dbsu) in cli/patroni/patroni.go — reads scope: from /etc/patroni/patroni.yml, falls back to utils.DBSUCommandOutput(dbsu, [\"cat\", DefaultConfigPath]) for the typical Pigsty layout where the file is postgres:postgres 0640.
  • Pure args builders extracted: buildRestartArgs / buildReinitArgs / buildSwitchoverArgs / buildFailoverArgs — testable without exec'ing patronictl. Each prepends `cluster` immediately after the verb.
  • Restart / Reinit / Switchover / Failover now resolve the cluster name once at the top, then call their respective builder + runPatronictl.
  • Misleading comment in original `Restart` ("Since we use -c config, cluster name is auto-detected" — only true for pause/resume/list) removed.

`cli/context/context.go` has a similar helper (`getPatroniClusterName`) — left untouched to keep this PR focused; happy to follow up with a refactor to reuse `GetClusterName` if you'd like.

Tests

`cli/patroni/patroni_test.go` extended:

  • `TestBuildRestartArgs` — table-driven, 4 cases (nil opts / pending+force / specific member+force / role filter). Verifies cluster appears at `args[1]` and the flag/positional layout matches patronictl's CLI contract.
  • `TestBuildReinitArgs / TestBuildSwitchoverArgs / TestBuildFailoverArgs` — same shape for the other three commands.
  • `TestPatronictlPositionalContract` — regression guard. Documents the invariant inline so a future refactor that drops the prepend fails fast with a self-explanatory message.

Validation

  • `go build ./...` clean
  • `go vet ./...` clean
  • `go test ./...` — every package green (`cli/patroni` 3.818s, full suite ~80s)
  • Manually exercised the fix path on a 2-node Pigsty v4.3 cluster (Ubuntu 24.04, patroni 4.1.2, etcd3 DCS) — `Restart` now passes through to `patronictl restart pg-nms [member] --force --pending` correctly, clears pending restart markers as expected.

Out of scope (kept minimal)

  • `cli/context/context.go`'s `getPatroniClusterName` (duplicate logic) — happy to refactor in a follow-up.
  • `buildSwitchoverCommand` / `buildFailoverCommand` (for the Plan display surface) — those return what the user would TYPE (`pig pt switchover ...`), not patronictl args, so they're correct as-is.

`patronictl restart|reinit|switchover|failover` all require CLUSTER_NAME
as the first positional argument. Unlike `pause/resume/list`, the
`-c <config>` flag does NOT supply scope to these four subcommands —
without the cluster name they error with either:

  - "Missing argument 'CLUSTER_NAME'"
  - patronictl interpreting a member name as the cluster scope and
    erroring out at the etcd permission boundary
  - silently restarting the wrong member set when the pig wrapper drops
    trailing positional args

The four Restart/Reinit/Switchover/Failover functions in
cli/patroni/patroni.go built `patronictl` args without prepending the
cluster scope. The misleading comment in `Restart` claimed `-c config`
auto-detects the cluster name — true for pause/resume/list, not for
these subcommands.

Fix:
- Add exported `GetClusterName(dbsu)` reading `scope:` from
  /etc/patroni/patroni.yml with a DBSU fallback for the typical Pigsty
  layout where the file is postgres:postgres 0640.
- Extract pure `buildRestartArgs / buildReinitArgs / buildSwitchoverArgs
  / buildFailoverArgs` helpers (testable without exec'ing patronictl).
- Have Restart/Reinit/Switchover/Failover resolve cluster name once at
  the top, then delegate to the args builder + runPatronictl.

Tests:
- TestBuildRestartArgs (table-driven: nil opts / pending+force /
  member+force / role filter) — verifies cluster appears at args[1] and
  flag/positional layout matches patronictl's CLI contract.
- TestBuildReinitArgs / TestBuildSwitchoverArgs / TestBuildFailoverArgs
  for the same shape.
- TestPatronictlPositionalContract is a regression guard documenting
  the constraint: if a future refactor drops the prepend, this fails.

Fixes pgsty#29.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pig pt restart/reinit/switchover/failover: patronictl CLUSTER_NAME never prepended → all four fail

1 participant