Skip to content

[DPE-9049] feat: add pre-refresh/post-refresh hooks and migrate-data daemon for bidirectional storage layout migration#277

Open
marceloneppel wants to merge 6 commits into
16/edgefrom
feat/versioned-storage-layout
Open

[DPE-9049] feat: add pre-refresh/post-refresh hooks and migrate-data daemon for bidirectional storage layout migration#277
marceloneppel wants to merge 6 commits into
16/edgefrom
feat/versioned-storage-layout

Conversation

@marceloneppel
Copy link
Copy Markdown
Member

@marceloneppel marceloneppel commented May 6, 2026

Issue

PostgreSQL warns against using a mount point directly as a data directory (e.g. lost+found causes initdb to fail). It's needed to relocate all four Juju-managed storage mounts from the mount root to a versioned 16/main subdirectory:

Storage Mount point Old directory New directory
data {SNAP_COMMON}/var/lib/postgresql ${MOUNT} ${MOUNT}/16/main
archive {SNAP_COMMON}/data/archive ${MOUNT} ${MOUNT}/16/main
logs {SNAP_COMMON}/data/logs ${MOUNT} ${MOUNT}/16/main
temp {SNAP_COMMON}/data/temp ${MOUNT} ${MOUNT}/16/main

The charm-side counterpart (path constants, Patroni config, _ensure_storage_layout(), temp tablespace catalog migration) is in canonical/postgresql-operator#1649.

Solution

Add snap hooks that perform bidirectional data migration at snap refresh time, so upgrades and rollbacks both work transparently.

New files

  • snap/hooks/pre-refresh — runs before snapd switches revisions. Starts the migrate-data daemon and waits up to 120 s for it to finish. Handles the reverse (rollback) path: data must be back at the mount root before the older snap revision starts Patroni with root-layout config.
  • snap/hooks/post-refresh — runs after snapd switches revisions. Starts the same daemon for the forward (upgrade) path as a belt-and-suspenders backup (the charm's _ensure_storage_layout() is the primary forward migrator).
  • snap/local/migrate-data.sh — the migration script, that can be started on demand by snapctl start. Runs as _daemon_ (via setpriv.sh) to have ownership over snap data files.

Migration logic (migrate-data.sh)

Direction detection: reads the already-rendered Patroni YAML at $SNAP_DATA/etc/patroni/patroni.yaml. If it contains 16/main, the charm expects versioned layout → forward migration; otherwise → reverse migration.

Forward migration (root → 16/main):

  • Creates versioned subdirectories under each storage mount.
  • Moves files one-by-one from the mount root into the versioned path (skips 16/, lost+found, and existing pg_wal symlinks).
  • Repairs the pg_wal symlink to point at the new logs versioned path.
  • Skips the temp tablespace (ephemeral, root-owned by Juju; recreated by the charm via SQL DROP/CREATE TABLESPACE).

Reverse migration (16/main → root):

  • Before moving data, reconciles the PostgreSQL temp tablespace catalog entry back to the mount root path (using the direct psql binary with LD_LIBRARY_PATH to bypass the Perl wrapper).
  • Merges versioned contents back into the mount root with a file-by-file directory merge for robustness.
  • Cleans up the versioned temp directory (handled separately by the charm).
  • Restores pg_wal symlink at the root data directory.
  • Enforces chmod 700 on the data directory as PostgreSQL requires.

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

…bidirectional storage layout migration

Context: New pre-refresh and post-refresh hooks that start a one-shot migrate-data daemon via snapctl. The daemon runs as _daemon_
(via setpriv.sh) to bypass snap confinement restrictions on daemon-owned files. It reads the already-rendered Patroni YAML to detect
direction — forward (root → 16/main) or reverse (16/main → root) — and migrates all four storage mounts (data, archive, logs, temp)
with file-by-file directory merge for robustness.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
The temp storage mount is owned by root (juju-managed) and may contain
root-owned subdirectories created by the charm's _ensure_storage_layout.
The migrate-data daemon runs as _daemon_ and cannot chmod or rename
files inside a root-owned temp root, causing the reverse migration to
fail with "Operation not permitted".

Skip forward migration of temp entirely (ephemeral, recreated by charm)
and use rm -rf instead of reverse_one for the reverse path.  Also remove
temp from all chmod loops in the reverse migration branch.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
When the charm is rolled back to a revision that pins an older snap
without versioned storage, the pre-refresh hook must reconcile the
PostgreSQL temp tablespace catalog from the versioned path (16/main)
back to the storage mount root.  Without this the daemon starts with
a catalog entry pointing to an empty or stale directory.

- Extract the operator superuser password from patroni.yaml via
  python3+PyYAML and export as PGPASSWORD.
- Use the direct postgresql binary with LD_LIBRARY_PATH set,
  bypassing the Perl-based psql wrapper that cannot resolve its
  modules inside the snap context.
- Run the reverse catalog migration before the PG_VERSION early-exit
  guard so the tablespace is always reconciled, even when persistent
  data was already moved back on a previous cycle.
- Read patroni.yaml from $SNAP_DATA (writable snap data), not $SNAP
  (read-only mount).
- Guard the password extraction against $SNAP being unset so that
  set -u does not crash the daemon, which would trigger a restart
  loop and disrupt Patroni raft leader election.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…n script

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…n script

Run migrate-data.sh directly from pre/post-refresh hooks instead of
starting a systemd service, avoiding race conditions with snap confinement.
Skip DDL on replicas during reverse migration by checking pg_is_in_recovery(),
preventing ReadOnlySqlTransaction errors in multi-unit rolling rollbacks.
Remove the unused migrate-data snap daemon definition from snapcraft.yaml.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…rage-layout

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel marceloneppel changed the title feat: add pre-refresh/post-refresh hooks and migrate-data daemon for bidirectional storage layout migration [DPE-9049] feat: add pre-refresh/post-refresh hooks and migrate-data daemon for bidirectional storage layout migration May 25, 2026
@marceloneppel marceloneppel marked this pull request as ready for review May 25, 2026 20:45
@marceloneppel marceloneppel requested a review from a team as a code owner May 25, 2026 20:45
@marceloneppel marceloneppel requested review from carlcsaposs-canonical, dragomirp, juju-charm-bot and taurus-forever and removed request for a team May 25, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant