CSI Storage Service Provider enhancement document#49
Conversation
…orage Service Provider enhancement document Introduce the Storage SP enhancement document defining a new service type for DCM. The document covers the full lifecycle of managing persistent storage volumes on Kubernetes clusters via CSI, including: - block and file storage support - StorageSpec schema - API endpoints (CRUD + StorageClass discovery) - status reporting via CloudEvents over NATS - user flow diagrams - registration with capability advertisement - Podman/podman-compose deployment model According to - FLPATH-4113 Assistend-By: Claude Opus 4.6 (Anthropic) Signed-off-by: ebichman-1 <ebichman@redhat.com>
|
|
||
| ## Open Questions | ||
|
|
||
| > 1. Should StorageClass discovery be a dedicated endpoint |
There was a problem hiding this comment.
what is the goal for sc discovery?
| > metadata advertised to DCM? The current proposal keeps both — registration | ||
| > metadata carries a static snapshot while the endpoint reflects live cluster | ||
| > state. | ||
| > 2. Should the Storage SP manage its own Kubernetes namespace for PVCs, or |
There was a problem hiding this comment.
in my opinion pvcs should be created in the same namespace as the application using it.
| > should it accept a per-request namespace override via `providerHints`? v1 | ||
| > uses a single configured namespace; per-request namespace is deferred. | ||
| > 3. The `service-type-definitions.md` currently lists standalone storage as a | ||
| > Non-Goal. This document introduces `storage` as a fifth service type. |
There was a problem hiding this comment.
yes, we need to define new service type and update corresponding references.
There was a problem hiding this comment.
Sure, I'll take that into consideration.
| persistent storage volumes on Kubernetes clusters via the Container Storage | ||
| Interface (CSI). It supports two storage categories: block storage (PVCs backed | ||
| by block devices such as Ceph RBD) and file storage (PVCs backed by shared | ||
| filesystems such as NFS or CephFS). The SP implements the `storage` service |
There was a problem hiding this comment.
why this selection? When you say Ceph do you mean ODF? What should happen when we have LSO, HPP or LVM?
| type schema, which must be added to | ||
| `enhancements/service-type-definitions/service-type-definitions.md` as a | ||
| prerequisite. The Storage SP is deployed as a Podman container alongside the | ||
| other DCM services using `podman-compose`, and connects to one or more target |
There was a problem hiding this comment.
do I understand correctly that we would like to manage storage for one or more clusters? How would it work?
There was a problem hiding this comment.
Currently, SPs are linked to a specific cluster, user credentials and namespace (I think @machacekondra suggested to coin it an environment). I think this comment taps into a bigger discussion. Does a single SP manage multiple environments. If yes, how does it get the environment details?
This also taps into the question of whether there should be a single SP for all things K8S instead of a different one per service-type
There was a problem hiding this comment.
That's a good point. This should definitely be part of our broader architecture discussion regarding environment management.
There was a problem hiding this comment.
@ygalblum in the text we have connects to one or more target. My goal with this question was to correct it. I agree that it is related to multi env discussion.
|
|
||
| ### Registration Flow | ||
|
|
||
| The Storage SP must successfully register with DCM before it can receive |
There was a problem hiding this comment.
how do we tell what is storage capacity?
There was a problem hiding this comment.
@pkliczewski how do you do that in OCP? I don't know that you can
There was a problem hiding this comment.
How do we effectively provide storage options to the user while without let him know how much capacity he have there ?
|
|
||
| ### StorageSpec Schema | ||
|
|
||
| The `storage` service type is a new fifth service type that must be added to |
There was a problem hiding this comment.
it is not clear to me what we want PE/user to customize and why
|
|
||
| ### User Flows | ||
|
|
||
| #### Create Block Storage Volume (End-to-End) |
There was a problem hiding this comment.
why not to have a single flow create volume? both filesystem and block are the same only provisioner is different (detail which is not important here)
There was a problem hiding this comment.
Agree. Just note that even the provisioner doesn't have to be different, only the parameter
There was a problem hiding this comment.
I'll take both of these suggestions into the implementation plan.
|
|
||
| ### Implementation Details | ||
|
|
||
| #### PVC Naming Validation |
There was a problem hiding this comment.
what will be our naming scheme to mitigate name conflicts in a namespace?
|
|
||
| - The Storage SP is stateless with respect to its own process — all state lives | ||
| in Kubernetes PVC labels and in the DCM database. Upgrading the container | ||
| image and restarting via `podman-compose up --pull always` is sufficient. |
There was a problem hiding this comment.
it would be good to control when containers are updated. We not always want to upgrade when new version is available.
| `statusMessage`, `createTime`, `updateTime`. | ||
|
|
||
| | Field | Required | Type | Description | | ||
| | ------------- | -------- | ------ | -------------------------------------------------------- | |
There was a problem hiding this comment.
Nice doc overall -- the API surface is well thought out.
One thing I'd flag in the StorageSpec schema: the storageType field (block | file) is going to cause confusion with Kubernetes' own volumeMode (Filesystem | Block). The default case is especially weird -- storageType: "block" produces a PVC with volumeMode: Filesystem, which in K8s means a mounted directory. Most people would call that "file storage," not "block storage."
What storageType is really describing is the provisioner backend (RBD vs NFS), not the access semantics. But by using the word "block," it collides with what K8s already means by "Block" (raw block device, no filesystem). So the mapping ends up being:
- DCM
block+ K8sFilesystem= normal mounted directory (the common case) - DCM
block+ K8sBlock= raw block device (rare) - DCM
file+ K8sFilesystem= also a mounted directory, but via NFS
The StorageClass already encodes everything storageType is trying to express, and the /storage/classes discovery endpoint gives users visibility into what's available. Imo it'd be cleaner to drop storageType entirely and let storageClassName do the work. If there's a reason to keep it, renaming it to something that doesn't overlap with K8s vocabulary would help a lot -- maybe provisionerType or backendType.
There was a problem hiding this comment.
I don't understand the need to separate storageType and VolumeMode
Also, the name of the storage class is meaningless. It does not have to imply any of the SC's capabilities. Furthermore, a single SC can support both Block and FileSystem and can support all accessModes. So, if anything, it is the StorageClassName that can be removed.
| - `storageClassName`, if specified, must be present in the cluster's | ||
| StorageClass list; otherwise the SP returns `422 Unprocessable Entity`. | ||
| - `metadata.name` must be a valid Kubernetes name: lowercase alphanumeric and | ||
| `-`, max 253 characters, must not start or end with `-`. |
There was a problem hiding this comment.
This rule rejects valid requests. NFS and CephFS both support ReadWriteOnce -- it just means one node mounts at a time. Kubernetes imposes no such restriction either; you can create a PVC with storageClassName: nfs-csi and accessMode: ReadWriteOnce and it works fine.
The doc's own discovery response (line 457) lists RWO as a supported mode for nfs-csi:
"supportedAccessModes": ["ReadWriteOnce", "ReadOnlyMany", "ReadWriteMany"]So the /storage/classes endpoint advertises RWO support, but the validation layer rejects it -- the SP contradicts its own public contract.
RWX as the default for file storage makes sense. RWX as a requirement does not. Suggest removing the hard 422 and instead validating access modes against the actual supportedAccessModes from the target StorageClass, which the SP already has from startup discovery.
There was a problem hiding this comment.
Sure, great point, thanks
| add the `storage` status enum (`PENDING`, `PROVISIONING`, `READY`, `FAILED`, | ||
| `DELETING`, `DELETED`) and the `StorageStatus` Go struct. | ||
| - Update `enhancements/user-flows/user-flows.md`: add Storage SP to the system | ||
| overview table (section 1) and to the SP instance creation flow (section 6.4). |
There was a problem hiding this comment.
This is listed as both a prerequisite (lines 35-37, 46-47) and a follow-up item (here). Can't be both -- service-type-definitions.md currently says standalone storage is a Non-Goal, so merging this doc without that update means two enhancement docs that directly contradict each other.
I checked and there's no open PR or issue tracking this work. Wherever we land on ordering (update first, or merge this as the proposal to change it), it should be tracked somewhere so it doesn't fall through the cracks.
There was a problem hiding this comment.
Sure, it a good point, I agree that we should avoid maintaining 2 docs.
I haven't opened a PR yet, as I’m still not sure and refining the approach and want to ensure it aligns with our requirements.
| - `dcm-service-type=storage` | ||
|
|
||
| The `dcm-instance-id` UUID is generated by DCM and passed in the request. If a | ||
| PVC with the same `metadata.name` already exists in the configured namespace, |
There was a problem hiding this comment.
Worth specifying what happens when DCM retries a CREATE with the same dcm-instance-id (e.g., the 201 was lost in transit). The doc covers name conflicts (409) but not idempotent retry.
For reference, the existing SPs are inconsistent here -- ACM cluster SP checks for a matching dcm-instance-id before creating and returns 409 if found (but doesn't return the existing resource), while the KubeVirt SP has no duplicate check at all and silently creates a second resource. Neither does a true idempotent return.
This doc has an opportunity to get it right. At minimum it should specify: does the SP check dcm-instance-id before creating, and if a match is found, does it return 409 or return the existing resource as 200/201?
There was a problem hiding this comment.
I agree. But the decision should not be specific to this SP but rather a generic behaviour for all SPs for all ServiceTypes. Should they be idempotent or should they return a conflict.
Need to keep in mind that not failing the request raises the question of what to do with differences between the request and the existing resource (e.g. size of volume). Is a mismatch a conflict? Is it an update (currently out of scope)? Is it a noop but then the resource is not as expected by the last caller?
|
|
||
| ## Open Questions | ||
|
|
||
| > 1. Should StorageClass discovery be a dedicated endpoint |
There was a problem hiding this comment.
Styling - why is this section in a block-quote?
There was a problem hiding this comment.
I've included this block temporarily to facilitate our brainstorming and to help me map out the full scope of the requirements.
I agree that it should be removed once I'll have the whole plan and I can finalize the documentation.
| filesystems such as NFS or CephFS). The SP implements the `storage` service | ||
| type schema, which must be added to | ||
| `enhancements/service-type-definitions/service-type-definitions.md` as a | ||
| prerequisite. The Storage SP is deployed as a Podman container alongside the |
There was a problem hiding this comment.
A few comments here:
- The SP is a containerized application. Podman is just a tool to run containers
- As a result, it doesn't have to be deployed using
Podman. It can be Docker or a K8S cluster - Similarly, it doesn't have to be
podman-compose. For example, https://github.com/dcm-project/quadlet-deploy usesQuadlet - Lastly, SPs don't have to be deployed as part of DCM. They just need to be be able to connect to and be connected from DCM.
| type schema, which must be added to | ||
| `enhancements/service-type-definitions/service-type-definitions.md` as a | ||
| prerequisite. The Storage SP is deployed as a Podman container alongside the | ||
| other DCM services using `podman-compose`, and connects to one or more target |
There was a problem hiding this comment.
Currently, SPs are linked to a specific cluster, user credentials and namespace (I think @machacekondra suggested to coin it an environment). I think this comment taps into a bigger discussion. Does a single SP manage multiple environments. If yes, how does it get the environment details?
This also taps into the question of whether there should be a single SP for all things K8S instead of a different one per service-type
| valid `providerHints` values without trial and error. | ||
| - Define the status reporting mechanism using CloudEvents over NATS. | ||
| - Support both block storage and file storage types within a single SP instance. | ||
| - Define the `storage` service type schema (`StorageSpec`). |
There was a problem hiding this comment.
The service type is generic and can be supported by different SPs. So, I think this part should go either here: https://github.com/dcm-project/enhancements/blob/6d3350d8f0909dd1ef5d4c8291c9f2f560cbefab/enhancements/service-type-definitions/service-type-definitions.md or in a separate doc.
@dcm-project/maintainers WDYT?
There was a problem hiding this comment.
I agree it should go into the service-type-definition.md enhancement doc.
There was a problem hiding this comment.
@ygalblum, I wasn't sure on that, I'll go with your suggestions feedback comments.
|
|
||
| - Day 2 operations: resize, snapshot, clone, backup, or any update to an | ||
| existing volume. | ||
| - CSI driver installation or management — the driver must be pre-installed on |
There was a problem hiding this comment.
I think not. The Cluster SP created clusters owned by the user. It is up to the user to configure them. While DCM has access to the credentials, it should not abuse them
| - `storageType: file` requires `accessMode: ReadWriteMany` or `ReadOnlyMany`. | ||
| If `accessMode: ReadWriteOnce` is specified with `file`, the SP returns | ||
| `422 Unprocessable Entity`. | ||
| - `volumeMode: Block` is only valid when `storageType: block`. |
There was a problem hiding this comment.
As I wrote, this seems like a duplication
There was a problem hiding this comment.
Sure, I'll remove it
|
|
||
| ### User Flows | ||
|
|
||
| #### Create Block Storage Volume (End-to-End) |
There was a problem hiding this comment.
Agree. Just note that even the provisioner doesn't have to be different, only the parameter
| | **WaitForFirstConsumer StorageClasses** — PVC stays `Pending` indefinitely until mounted by a Pod. Since v1 provisions standalone storage only (no Pod mounting), these StorageClasses will not work with this SP. | Document as a known limitation. Check StorageClass `volumeBindingMode` at creation time; if `WaitForFirstConsumer` is detected and no workload attachment is requested, return `422 Unprocessable Entity`. | | ||
| | **PVC protection** — Kubernetes defers PVC deletion while it is actively used by a Pod (`kubernetes.io/pvc-protection` finalizer). The SP issues the DELETE but the PVC may remain until the consuming Pod is removed. | Publish `DELETING` status immediately on DELETE request. Continue watching the informer until the PVC is fully removed and then publish `DELETED`. | | ||
| | **PVC admission webhook rejection** — Cluster admission controllers can reject PVC creation for quota, policy, or naming reasons. | Map Kubernetes admission webhook errors (HTTP 403 or rejection bodies) to `422 Unprocessable Entity` in the SP's error handling layer. | | ||
| | **StorageClass exhaustion** — The target StorageClass may run out of available capacity on the backend. | The CSI driver reports this via PVC events. The informer catches `ProvisioningFailed` events and publishes a `FAILED` status with the event message as `statusMessage`. | |
There was a problem hiding this comment.
Are you sure about this. K8S is eventually consistent. So, it considers this case as transient and keep on trying to provision the volume (which will succeed once storage is available)
There was a problem hiding this comment.
I’ve outlined a few potential status codes to return for various failure scenarios from ks8.
We should decide which details to expose to the user and which should remain internal for logging and debugging.
|
|
||
| - The Storage SP is stateless with respect to its own process — all state lives | ||
| in Kubernetes PVC labels and in the DCM database. Upgrading the container | ||
| image and restarting via `podman-compose up --pull always` is sufficient. |
There was a problem hiding this comment.
Again, referencing podman, podman-compose or any other such command is incorrect
There was a problem hiding this comment.
Sure, agree, I need to modify this one
| The following items are explicitly out of scope for this document but must be | ||
| tracked: | ||
|
|
||
| - Update `enhancements/service-type-definitions/service-type-definitions.md`: |
There was a problem hiding this comment.
As I wrote, I don't this the definition of the Storage ServiceType should be in the current doc.
CSI Storage Service Provider
Introduce the Storage SP enhancement document defining a new service type for DCM.
The document covers the full lifecycle of managing persistent storage volumes on Kubernetes clusters via CSI, including:
According to - FLPATH-4113