Refactor replication FSM: per-workload tables, idempotent transitions, integrated invalid state

Parent: #726

## Background

The replication state machine in `ceph/replication.go` (3 states, 7 events) is shared across RBD and CephFS workloads via a single transition table. Several inconsistencies follow from that:

- `replication_invalid` is declared but unreachable — no state config, no edges. CephFS `GetResourceState` returns it (`ceph/replication_cephfs.go:119`), after which every event hits the unhandled-trigger handler.
- CephFS does not support `configure`, `promote`, or `demote` (not implemented upstream). Handlers stub them with errors (`ceph/replication_cephfs.go:203,283,289`), but the FSM still advertises those events as legal transitions, so the FSM accepts the trigger and the handler then errors. Failure surface is misleading.
- Self-transitions are wired asymmetrically: `disabled` state has both `OnEntryFrom(disable)` and `InternalTransition(disable)` — the `OnEntryFrom` is dead in this position. `enabled` has `OnEntryFrom(enable)` with no `InternalTransition`, so re-enable on enabled hits the unhandled handler. RBD handlers contain idempotency guards (`ceph/replication_rbd.go:324-326,380-381`) that the FSM rejects before they can run.
- Workload-level `promote` and `demote` (URL `PUT /ops/replication/{wl}`, no resource ID) currently go through the per-resource FSM with an empty resource. `GetResourceState` runs against a zero-valued struct, FSM seeds with whatever state that produces, transition succeeds, handler iterates real pools and silently filters. Net: promote on a cluster with zero enabled mirrors returns 200 with nothing done.
- The FSM is per-request and per-resource, but several handlers operate site-wide. The model claims more uniformity than it delivers.

## Proposed changes

### 1. Per-workload FSM construction

Move FSM construction off the package-level `GetReplicationStateMachine` and onto each handler via a new interface method:

```go
type ReplicationHandlerInterface interface {
    // ... existing methods
    GetStateMachine(initialState ReplicationState) *stateless.StateMachine
}
```

Shared scaffolding (logger callback, unhandled-trigger callback, type registration loop) factored into a private `newBaseFsm` helper. Each workload then wires only the events it actually supports.

- RBD: enable, disable, configure, list, status.
- CephFS: enable, disable, list, status.

Promote and demote are removed from the FSM entirely (see #4).

CephFS-specific stubbed handlers (`ConfigureHandler`, `PromoteHandler`, `DemoteHandler`) deleted from the interface or made optional. Unsupported events on a workload hit `unhandledTransitionHandler` and return a clean `operation X not permitted` error rather than reaching a stub.

### 2. Integrate `invalid` state

`replication_invalid` becomes a recoverable state with two outbound transitions and two read-only internal transitions:

| From | Event | To |
|---|---|---|
| invalid | enable | enabled |
| invalid | disable | disabled |
| invalid | list | invalid |
| invalid | status | invalid |

Configure / promote / demote on invalid remain unhandled (operator forced to enable or disable to recover first).

CephFS path that returns `StateInvalidReplication` (`ceph/replication_cephfs.go:119`) needs verification during local testing: confirm the disable handler tolerates the underlying condition that triggers invalid in the first place.

### 3. Idempotent self-loops on enabled and disabled

Replace the dead `OnEntryFrom(self)` wiring with explicit `InternalTransition` self-loops on both states:

- `disabled --disable--> disabled` (InternalTransition): re-disable runs cleanup. Handlers must tolerate `nothing to disable`.
- `enabled --enable--> enabled` (InternalTransition): re-enable is idempotent. RBD handlers already check `PoolInfo.Mode` and return nil when target mode is in place.

Rationale: Ceph itself models mirroring as converge-to-target. The FSM should match. Re-enable is not an error; re-disable is the operator's recovery tool when prior disable left orphan state.

### 4. Site-wide actions bypass the FSM

Workload-level promote and demote (URL `PUT /ops/replication/{wl}`) are aggregates over many resources, not lifecycle transitions on a single resource. They should not run through the per-resource FSM.

In `api/ops_replication.go::handleReplicationRequest`, branch before PreFill:

```go
event := req.GetWorkloadRequestType()
if isWorkloadAction(event) {
    return runWorkloadAction(ctx, s, rh, req, event)
}
// existing per-resource FSM path
```

`runWorkloadAction` dispatches directly to handler-side promote/demote logic (today's `handleSiteOp`), which enumerates pools, applies the action where applicable, and returns an aggregate result body:

```json
{ "promoted": 3, "skipped": 1, "errors": [] }
```

Operator sees explicit counts; `promoted 0 of 0` is no longer ambiguous with success.

Promote/demote handler methods stay on the RBD handler. CephFS handler does not implement them; `runWorkloadAction` returns a clear not-supported error when called against CephFS.

### 5. Idempotent enable/disable steps

`EnableHandler` and `DisableHandler` today perform multi-step Ceph operations with no rollback path. Partial failures leave a half-configured cluster.

Convert each step to an `ensureX(ctx)` shape: probe current Ceph state, skip if already at target, apply otherwise. Step audit and per-step probe design happen during implementation. Pairs with #3: enable converges forward, disable converges backward, retry after failure picks up where the previous run stopped.

## Out of scope

- REST URL redesign and wire-schema cleanup. Tracked in a sibling issue under #726.
- Concurrency safety across simultaneous mutating requests. Tracked in a sibling issue under #726.

## Acceptance

- All replication tests pass with new per-workload FSM construction.
- Re-enable on enabled and re-disable on disabled return success and reach handler logic.
- CephFS configure / promote / demote return a uniform FSM `not permitted` error rather than handler-stub error.
- Promote on a cluster with zero enabled rbd mirrors returns an explicit aggregate result body, not silent 200.
- `replication_invalid` state can recover via enable or disable.
- Local testing covers the conditions that drive CephFS into the invalid state.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor replication FSM: per-workload tables, idempotent transitions, integrated invalid state #727

Background

Proposed changes

1. Per-workload FSM construction

2. Integrate `invalid` state

3. Idempotent self-loops on enabled and disabled

4. Site-wide actions bypass the FSM

5. Idempotent enable/disable steps

Out of scope

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

From	Event	To
invalid	enable	enabled
invalid	disable	disabled
invalid	list	invalid
invalid	status	invalid

Refactor replication FSM: per-workload tables, idempotent transitions, integrated invalid state #727

Description

Background

Proposed changes

1. Per-workload FSM construction

2. Integrate invalid state

3. Idempotent self-loops on enabled and disabled

4. Site-wide actions bypass the FSM

5. Idempotent enable/disable steps

Out of scope

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

2. Integrate `invalid` state