`promote-to-primary` action can break async-replication if primary unit on secondary cluster takes over sync role first

On a dual-cluster setup with asynchronous replication (2+2), after `promote-to-primary scope=unit`, the secondary is stuck in Replica and doesn't go to Sync Standby.

Summary of the setup:

| Member       | IP address    | Site        |    Patroni Status                        |
| -----------  | -----------  | -----------  | -----------  |
| postgresql-0 | 10.10.128.24 | Primary      |   Sync Standby (primary cluster)         |
| postgresql-1 | 10.10.128.23 | Primary      |   Leader (primary cluster)      |
| postgresql-0 | 10.10.118.24 | Secondary    |   Standby Leader (secondary cluster)       |
| postgresql-1 | 10.10.118.23 | Secondary    |   Replica (secondary cluster)    |


## Steps to reproduce
1. On primary cluster, perform a `promote-to-primary` action to switch secondary to primary unit
```
juju run postgresql/0 -- promote-to-primary scope=unit
```
2. Observe patronictl output

## Expected behavior
```
sudo patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml list
+ Cluster: postgresql (7633847608209757521) -+-----------+----+-----------+
| Member       | Host         | Role         | State     | TL | Lag in MB |
+--------------+--------------+--------------+-----------+----+-----------+
| postgresql-0 | 10.52.128.24 | Leader       | running   |  4 |           |
| postgresql-1 | 10.52.128.23 | Sync Standby | streaming |  4 |         0 |
+--------------+--------------+--------------+-----------+----+-----------+
```

## Actual behavior
```
sudo patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml list
+ Cluster: postgresql (7633847608209757521) --------+----+-----------+
| Member       | Host         | Role    | State     | TL | Lag in MB |
+--------------+--------------+---------+-----------+----+-----------+
| postgresql-0 | 10.52.128.24 | Leader  | running   |  4 |           |
| postgresql-1 | 10.52.128.23 | Replica | streaming |  4 |         0 |
+--------------+--------------+---------+-----------+----+-----------+
```


## Versions

Operating system: Ubuntu 24.04
Juju CLI: 3.6.14
Juju agent: 3.6.14
Charm revision: 1047
LXD: 5.21/stable

## Log output
Juju debug log: 

## Additional context

When the promote happens, we now have 2 nodes `postgresql-0`  the pg_stat_replication table. First one in `sync` is the primary unit of the secondary cluster. Second one is the replica unit of the main cluster (the one we aimed to failover with the `promote-to-primary` action)

```
postgres=# SELECT application_name, client_addr, state, sync_state FROM pg_stat_replication;
 application_name | client_addr  |   state   | sync_state 
------------------+--------------+-----------+------------
 postgresql-0     | 10.10.118.24 | streaming | sync
 postgresql-0     | 10.10.128.24 | streaming | potential
```

Workaround:
- Adjust `synchronous_node_count` from `1` to `2`. This cannot be increased to more than 1 using `juju config`.
```
sudo charmed-postgresql.patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml edit-config
```
- Restart patroni on the main unit of the secondary cluster (10.10.118.24)
```
sudo snap start charmed-postgresql.patroni
```

Now the correct unit (10.10.128.24, new Leader) took over the `sync` state
```
postgres=# SELECT application_name, client_addr, state, sync_state FROM pg_stat_replication;
 application_name | client_addr  |   state   | sync_state 
------------------+--------------+-----------+------------
 postgresql-0     | 10.10.128.24 | streaming | sync
 postgresql-1     | 10.10.118.23 | streaming | async
(2 rows)
```

And patronictl returns proper config
```
sudo patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml list
+ Cluster: postgresql (7633847608209757521) -+-----------+----+-----------+
| Member       | Host         | Role         | State     | TL | Lag in MB |
+--------------+--------------+--------------+-----------+----+-----------+
| postgresql-0 | 10.10.128.24 | Leader       | running   |  4 |           |
| postgresql-1 | 10.10.128.23 | Sync Standby | streaming |  4 |         0 |
+--------------+--------------+--------------+-----------+----+-----------+
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`promote-to-primary` action can break async-replication if primary unit on secondary cluster takes over sync role first #1665

Steps to reproduce

Expected behavior

Actual behavior

Versions

Log output

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Member	IP address	Site	Patroni Status
postgresql-0	10.10.128.24	Primary	Sync Standby (primary cluster)
postgresql-1	10.10.128.23	Primary	Leader (primary cluster)
postgresql-0	10.10.118.24	Secondary	Standby Leader (secondary cluster)
postgresql-1	10.10.118.23	Secondary	Replica (secondary cluster)

promote-to-primary action can break async-replication if primary unit on secondary cluster takes over sync role first #1665

Description

Steps to reproduce

Expected behavior

Actual behavior

Versions

Log output

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`promote-to-primary` action can break async-replication if primary unit on secondary cluster takes over sync role first #1665