Skip to content

Serialize concurrent mutating replication requests on the same resource #729

@UtkarshBhatthere

Description

@UtkarshBhatthere

Parent: #726
Depends on: #727 (site-wide actions must be dispatched outside the per-resource FSM so the workload-scoped lock has a stable target).

Background

Replication mutating endpoints (POST / PUT / DELETE on /ops/replication/{wl}/{name}) are registered with ProxyTarget: false in api/ops_replication.go:31-41. A request lands on whichever microceph node received it. There is no in-process or cross-node coordination preventing two simultaneous mutations against the same Ceph resource.

Failure modes:

  • Two concurrent enable requests on the same RBD pool both run PreFill, which snapshots PoolInfo.Mode and PoolStatus.State from Ceph (ceph/replication_rbd.go:85-109). Both believe the pool is not yet enabled. Both fire the FSM (each with its own state machine instance). Both call handlePoolEnablement, whose idempotency guard (ceph/replication_rbd.go:324-326) is read-then-act and does not interlock with the parallel request. Second request may dupe rbd mirror pool peer add or partially configure schedules.
  • Concurrent disable and enable on the same resource produce ordering-dependent outcomes the operator cannot predict.
  • Concurrent enable requests landing on different microceph nodes serialize at the Ceph mon level only as far as Ceph itself enforces; node-local state (keyrings written to disk, IsRemoteConfiguredForRbdMirror checks) is not coordinated.

The per-request FSM provides no help: each request gets its own state machine, so there is no shared this resource is currently being modified signal.

Proposed approach

Two layers, both required.

1. Route mutating requests through the cluster leader

Flip mutating replication endpoints to ProxyTarget: true so that microcluster routes them to the leader. Reads stay non-proxied for parallelism.

This serializes all mutating replication ops onto a single node, which is a precondition for in-process locking to be meaningful.

2. Per-resource named mutex on the executing node

Introduce a process-local lock keyed by {workload}/{resource-id}:

// ceph/replication_lock.go
var repLocks sync.Map // map[string]*sync.Mutex

func AcquireReplicationLock(req types.ReplicationRequest) func() {
    key := fmt.Sprintf("%s/%s", req.GetWorkloadType(), req.GetAPIObjectID())
    m, _ := repLocks.LoadOrStore(key, &sync.Mutex{})
    mu := m.(*sync.Mutex)
    mu.Lock()
    return mu.Unlock
}

Wired in api/ops_replication.go::handleReplicationRequest around PreFill + FireCtx:

release := ceph.AcquireReplicationLock(req)
defer release()
err := rh.PreFill(ctx, req)
// ...
err = repFsm.FireCtx(ctx, event, rh, &resp, ...)

Reads (GET endpoints) skip lock acquisition. Writes block until the prior write on the same resource releases.

The sync.Map of mutexes accumulates one entry per unique resource ever touched, bounded by total resource count. Eviction unnecessary at expected scale.

Site-wide actions

Workload-level promote and demote (dispatched outside the per-resource FSM after #727) run against many resources. They acquire a workload-scoped lock — one mutex per workload — that excludes per-resource mutations during a site-wide action.

Tradeoff: site-wide action blocks all per-resource ops on the workload for its duration. Acceptable given site-wide actions are rare and operator-driven.

Out of scope

Acceptance

  • Two parallel POST /ops/replication/rbd/pools/foo requests serialize: second observes the first's mutation in its PreFill.
  • Read endpoints remain unblocked by writes on the same resource.
  • Site-wide promote and demote exclude concurrent per-resource mutations on the same workload.
  • Mutating endpoints route to the cluster leader regardless of which node received the request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions