Skip to content

MicroCeph remote replication quality improvements #726

@UtkarshBhatthere

Description

@UtkarshBhatthere

Background

The remote replication framework in MicroCeph (RBD + CephFS) is built around a per-request finite state machine (ceph/replication.go) with workload-specific handlers. A review of the framework surfaced several quality issues that fall into three coherent buckets: FSM model cleanup, REST surface redesign, and concurrency safety. Each bucket is tracked as a child issue below.

Goals

  • Make the FSM model honest about what is per-resource lifecycle and what is a site-wide action.
  • Make the REST surface identify operations by (method, path) alone, with no body-encoded discriminator and no overloaded URLs.
  • Make concurrent mutating requests on the same resource safe across the cluster.

Children

Landing order

  1. Refactor replication FSM: per-workload tables, idempotent transitions, integrated invalid state #727 (FSM cleanup).
  2. Redesign replication REST API: resource-scoped paths, drop body-encoded request type #728 and Serialize concurrent mutating replication requests on the same resource #729 in parallel.

Out of scope

  • Cross-process coordination with admin-run raw rbd / ceph CLI commands.
  • RGW replication (handler not yet implemented).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions