Skip to content

CSI Storage Service Provider enhancement document#49

Open
ebichman-1 wants to merge 1 commit into
dcm-project:mainfrom
ebichman-1:storage-sp-feature
Open

CSI Storage Service Provider enhancement document#49
ebichman-1 wants to merge 1 commit into
dcm-project:mainfrom
ebichman-1:storage-sp-feature

Conversation

@ebichman-1
Copy link
Copy Markdown
Contributor

CSI Storage Service Provider

Introduce the Storage SP enhancement document defining a new service type for DCM.
The document covers the full lifecycle of managing persistent storage volumes on Kubernetes clusters via CSI, including:

  • block and file storage support
  • StorageSpec schema
  • API endpoints (CRUD + StorageClass discovery)
  • status reporting via CloudEvents over NATS
  • user flow diagrams
  • registration with capability advertisement
  • Podman/podman-compose deployment model

According to - FLPATH-4113

…orage Service Provider enhancement document

Introduce the Storage SP enhancement document defining a new service type for DCM.
The document covers the full lifecycle of managing persistent storage volumes on Kubernetes clusters via CSI, including:
- block and file storage support
- StorageSpec schema
- API endpoints (CRUD + StorageClass discovery)
- status reporting via CloudEvents over NATS
- user flow diagrams
- registration with capability advertisement
- Podman/podman-compose deployment model

According to - FLPATH-4113

Assistend-By: Claude Opus 4.6 (Anthropic)
Signed-off-by: ebichman-1 <ebichman@redhat.com>

## Open Questions

> 1. Should StorageClass discovery be a dedicated endpoint
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the goal for sc discovery?

> metadata advertised to DCM? The current proposal keeps both — registration
> metadata carries a static snapshot while the endpoint reflects live cluster
> state.
> 2. Should the Storage SP manage its own Kubernetes namespace for PVCs, or
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion pvcs should be created in the same namespace as the application using it.

> should it accept a per-request namespace override via `providerHints`? v1
> uses a single configured namespace; per-request namespace is deferred.
> 3. The `service-type-definitions.md` currently lists standalone storage as a
> Non-Goal. This document introduces `storage` as a fifth service type.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we need to define new service type and update corresponding references.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll take that into consideration.

persistent storage volumes on Kubernetes clusters via the Container Storage
Interface (CSI). It supports two storage categories: block storage (PVCs backed
by block devices such as Ceph RBD) and file storage (PVCs backed by shared
filesystems such as NFS or CephFS). The SP implements the `storage` service
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this selection? When you say Ceph do you mean ODF? What should happen when we have LSO, HPP or LVM?

type schema, which must be added to
`enhancements/service-type-definitions/service-type-definitions.md` as a
prerequisite. The Storage SP is deployed as a Podman container alongside the
other DCM services using `podman-compose`, and connects to one or more target
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do I understand correctly that we would like to manage storage for one or more clusters? How would it work?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, SPs are linked to a specific cluster, user credentials and namespace (I think @machacekondra suggested to coin it an environment). I think this comment taps into a bigger discussion. Does a single SP manage multiple environments. If yes, how does it get the environment details?
This also taps into the question of whether there should be a single SP for all things K8S instead of a different one per service-type

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. This should definitely be part of our broader architecture discussion regarding environment management.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ygalblum in the text we have connects to one or more target. My goal with this question was to correct it. I agree that it is related to multi env discussion.


### Registration Flow

The Storage SP must successfully register with DCM before it can receive
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we tell what is storage capacity?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkliczewski how do you do that in OCP? I don't know that you can

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we effectively provide storage options to the user while without let him know how much capacity he have there ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ygalblum I know it is not that easy to get it. please take a look here


### StorageSpec Schema

The `storage` service type is a new fifth service type that must be added to
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not clear to me what we want PE/user to customize and why


### User Flows

#### Create Block Storage Volume (End-to-End)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not to have a single flow create volume? both filesystem and block are the same only provisioner is different (detail which is not important here)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Just note that even the provisioner doesn't have to be different, only the parameter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take both of these suggestions into the implementation plan.


### Implementation Details

#### PVC Naming Validation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will be our naming scheme to mitigate name conflicts in a namespace?


- The Storage SP is stateless with respect to its own process — all state lives
in Kubernetes PVC labels and in the DCM database. Upgrading the container
image and restarting via `podman-compose up --pull always` is sufficient.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to control when containers are updated. We not always want to upgrade when new version is available.

`statusMessage`, `createTime`, `updateTime`.

| Field | Required | Type | Description |
| ------------- | -------- | ------ | -------------------------------------------------------- |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice doc overall -- the API surface is well thought out.

One thing I'd flag in the StorageSpec schema: the storageType field (block | file) is going to cause confusion with Kubernetes' own volumeMode (Filesystem | Block). The default case is especially weird -- storageType: "block" produces a PVC with volumeMode: Filesystem, which in K8s means a mounted directory. Most people would call that "file storage," not "block storage."

What storageType is really describing is the provisioner backend (RBD vs NFS), not the access semantics. But by using the word "block," it collides with what K8s already means by "Block" (raw block device, no filesystem). So the mapping ends up being:

  • DCM block + K8s Filesystem = normal mounted directory (the common case)
  • DCM block + K8s Block = raw block device (rare)
  • DCM file + K8s Filesystem = also a mounted directory, but via NFS

The StorageClass already encodes everything storageType is trying to express, and the /storage/classes discovery endpoint gives users visibility into what's available. Imo it'd be cleaner to drop storageType entirely and let storageClassName do the work. If there's a reason to keep it, renaming it to something that doesn't overlap with K8s vocabulary would help a lot -- maybe provisionerType or backendType.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need to separate storageType and VolumeMode

Also, the name of the storage class is meaningless. It does not have to imply any of the SC's capabilities. Furthermore, a single SC can support both Block and FileSystem and can support all accessModes. So, if anything, it is the StorageClassName that can be removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chadcrum, @ygalblum actually raised this in the earlier comments, and on second thought, I agree it makes sense.
I'm definitely open for your all suggestions based on the feedback comments.

- `storageClassName`, if specified, must be present in the cluster's
StorageClass list; otherwise the SP returns `422 Unprocessable Entity`.
- `metadata.name` must be a valid Kubernetes name: lowercase alphanumeric and
`-`, max 253 characters, must not start or end with `-`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule rejects valid requests. NFS and CephFS both support ReadWriteOnce -- it just means one node mounts at a time. Kubernetes imposes no such restriction either; you can create a PVC with storageClassName: nfs-csi and accessMode: ReadWriteOnce and it works fine.

The doc's own discovery response (line 457) lists RWO as a supported mode for nfs-csi:

"supportedAccessModes": ["ReadWriteOnce", "ReadOnlyMany", "ReadWriteMany"]

So the /storage/classes endpoint advertises RWO support, but the validation layer rejects it -- the SP contradicts its own public contract.

RWX as the default for file storage makes sense. RWX as a requirement does not. Suggest removing the hard 422 and instead validating access modes against the actual supportedAccessModes from the target StorageClass, which the SP already has from startup discovery.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, great point, thanks

add the `storage` status enum (`PENDING`, `PROVISIONING`, `READY`, `FAILED`,
`DELETING`, `DELETED`) and the `StorageStatus` Go struct.
- Update `enhancements/user-flows/user-flows.md`: add Storage SP to the system
overview table (section 1) and to the SP instance creation flow (section 6.4).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is listed as both a prerequisite (lines 35-37, 46-47) and a follow-up item (here). Can't be both -- service-type-definitions.md currently says standalone storage is a Non-Goal, so merging this doc without that update means two enhancement docs that directly contradict each other.

I checked and there's no open PR or issue tracking this work. Wherever we land on ordering (update first, or merge this as the proposal to change it), it should be tracked somewhere so it doesn't fall through the cracks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it a good point, I agree that we should avoid maintaining 2 docs.
I haven't opened a PR yet, as I’m still not sure and refining the approach and want to ensure it aligns with our requirements.

- `dcm-service-type=storage`

The `dcm-instance-id` UUID is generated by DCM and passed in the request. If a
PVC with the same `metadata.name` already exists in the configured namespace,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth specifying what happens when DCM retries a CREATE with the same dcm-instance-id (e.g., the 201 was lost in transit). The doc covers name conflicts (409) but not idempotent retry.

For reference, the existing SPs are inconsistent here -- ACM cluster SP checks for a matching dcm-instance-id before creating and returns 409 if found (but doesn't return the existing resource), while the KubeVirt SP has no duplicate check at all and silently creates a second resource. Neither does a true idempotent return.

This doc has an opportunity to get it right. At minimum it should specify: does the SP check dcm-instance-id before creating, and if a match is found, does it return 409 or return the existing resource as 200/201?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. But the decision should not be specific to this SP but rather a generic behaviour for all SPs for all ServiceTypes. Should they be idempotent or should they return a conflict.
Need to keep in mind that not failing the request raises the question of what to do with differences between the request and the existing resource (e.g. size of volume). Is a mismatch a conflict? Is it an update (currently out of scope)? Is it a noop but then the resource is not as expected by the last caller?


## Open Questions

> 1. Should StorageClass discovery be a dedicated endpoint
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Styling - why is this section in a block-quote?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included this block temporarily to facilitate our brainstorming and to help me map out the full scope of the requirements.
I agree that it should be removed once I'll have the whole plan and I can finalize the documentation.

filesystems such as NFS or CephFS). The SP implements the `storage` service
type schema, which must be added to
`enhancements/service-type-definitions/service-type-definitions.md` as a
prerequisite. The Storage SP is deployed as a Podman container alongside the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments here:

  1. The SP is a containerized application. Podman is just a tool to run containers
  2. As a result, it doesn't have to be deployed using Podman. It can be Docker or a K8S cluster
  3. Similarly, it doesn't have to be podman-compose. For example, https://github.com/dcm-project/quadlet-deploy uses Quadlet
  4. Lastly, SPs don't have to be deployed as part of DCM. They just need to be be able to connect to and be connected from DCM.

type schema, which must be added to
`enhancements/service-type-definitions/service-type-definitions.md` as a
prerequisite. The Storage SP is deployed as a Podman container alongside the
other DCM services using `podman-compose`, and connects to one or more target
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, SPs are linked to a specific cluster, user credentials and namespace (I think @machacekondra suggested to coin it an environment). I think this comment taps into a bigger discussion. Does a single SP manage multiple environments. If yes, how does it get the environment details?
This also taps into the question of whether there should be a single SP for all things K8S instead of a different one per service-type

valid `providerHints` values without trial and error.
- Define the status reporting mechanism using CloudEvents over NATS.
- Support both block storage and file storage types within a single SP instance.
- Define the `storage` service type schema (`StorageSpec`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service type is generic and can be supported by different SPs. So, I think this part should go either here: https://github.com/dcm-project/enhancements/blob/6d3350d8f0909dd1ef5d4c8291c9f2f560cbefab/enhancements/service-type-definitions/service-type-definitions.md or in a separate doc.
@dcm-project/maintainers WDYT?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it should go into the service-type-definition.md enhancement doc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ygalblum, I wasn't sure on that, I'll go with your suggestions feedback comments.


- Day 2 operations: resize, snapshot, clone, backup, or any update to an
existing volume.
- CSI driver installation or management — the driver must be pre-installed on
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not. The Cluster SP created clusters owned by the user. It is up to the user to configure them. While DCM has access to the credentials, it should not abuse them

- `storageType: file` requires `accessMode: ReadWriteMany` or `ReadOnlyMany`.
If `accessMode: ReadWriteOnce` is specified with `file`, the SP returns
`422 Unprocessable Entity`.
- `volumeMode: Block` is only valid when `storageType: block`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote, this seems like a duplication

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll remove it


### User Flows

#### Create Block Storage Volume (End-to-End)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Just note that even the provisioner doesn't have to be different, only the parameter

| **WaitForFirstConsumer StorageClasses** — PVC stays `Pending` indefinitely until mounted by a Pod. Since v1 provisions standalone storage only (no Pod mounting), these StorageClasses will not work with this SP. | Document as a known limitation. Check StorageClass `volumeBindingMode` at creation time; if `WaitForFirstConsumer` is detected and no workload attachment is requested, return `422 Unprocessable Entity`. |
| **PVC protection** — Kubernetes defers PVC deletion while it is actively used by a Pod (`kubernetes.io/pvc-protection` finalizer). The SP issues the DELETE but the PVC may remain until the consuming Pod is removed. | Publish `DELETING` status immediately on DELETE request. Continue watching the informer until the PVC is fully removed and then publish `DELETED`. |
| **PVC admission webhook rejection** — Cluster admission controllers can reject PVC creation for quota, policy, or naming reasons. | Map Kubernetes admission webhook errors (HTTP 403 or rejection bodies) to `422 Unprocessable Entity` in the SP's error handling layer. |
| **StorageClass exhaustion** — The target StorageClass may run out of available capacity on the backend. | The CSI driver reports this via PVC events. The informer catches `ProvisioningFailed` events and publishes a `FAILED` status with the event message as `statusMessage`. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this. K8S is eventually consistent. So, it considers this case as transient and keep on trying to provision the volume (which will succeed once storage is available)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve outlined a few potential status codes to return for various failure scenarios from ks8.
We should decide which details to expose to the user and which should remain internal for logging and debugging.


- The Storage SP is stateless with respect to its own process — all state lives
in Kubernetes PVC labels and in the DCM database. Upgrading the container
image and restarting via `podman-compose up --pull always` is sufficient.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, referencing podman, podman-compose or any other such command is incorrect

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, agree, I need to modify this one

The following items are explicitly out of scope for this document but must be
tracked:

- Update `enhancements/service-type-definitions/service-type-definitions.md`:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote, I don't this the definition of the Storage ServiceType should be in the current doc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants