Enhancement proposal: kcli Service Provider#43
Conversation
Introduces the kcli SP — the first non-Kubernetes DCM service provider. It manages VMs and clusters through kweb (kcli's HTTP API), targeting development, testing, and homelab environments. Made-with: Cursor Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com>
Screenshots showing the full kcli SP integration flow through the DCM UI: providers, policies, service types, catalog items, instances, and resources. Made-with: Cursor Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com>
|
|
||
| ## Open Questions | ||
|
|
||
| 1. **kweb version pinning.** kweb has no versioned API contract. Should the SP |
There was a problem hiding this comment.
Is there any benefit of supporting multiple versions? if not we can pin SP version to a specific kcli version
There was a problem hiding this comment.
Good question. kcli doesn't follow conventional versioning — it's been at 99.0.0 for ~6 years, with a new RPM build on every commit (e.g. 99.0.0.git.202604230909.6534e7c). There's no "kcli 1.2" vs "kcli 1.3" to pin against.
What we can do instead: each SP release documents the kweb git commit it was validated against. For example, v0.1.0 was tested against 6534e7c. When a kweb change breaks or improves something, we bump the SP version and update the validated commit. This gives traceability without pretending kcli has semver releases.
Will update the proposal to close Open Question #1 with this approach.
| pin to a specific kcli release and test against it, or attempt to support | ||
| multiple kweb versions with feature detection? | ||
|
|
||
| 2. **Multi-backend status mapping.** kweb VM status strings vary by backend |
There was a problem hiding this comment.
I think it would be good to normalize statuses passed to DCM control plane for all backends
There was a problem hiding this comment.
Agreed. The SP already normalizes kweb status strings to DCM's vocabulary (RUNNING, STOPPED, PROVISIONING, ERROR). This works across all kcli backends (libvirt, vSphere, KubeVirt, AWS, Azure, OpenStack, etc.) since the SP talks to kweb over HTTP and the kweb API is backend-agnostic.
Status strings may vary by backend (e.g. libvirt returns up/down, vSphere adds suspended), but the SP handles known values and maps unknowns to ERROR.
For multi-backend deployments (e.g. one kweb on libvirt, another on AWS), the admin deploys one kweb+SP pair per backend. Each registers as service type vm (or cluster) with a unique provider name (e.g. kcli-libvirt-apollo, kcli-aws-useast). DCM's Rego policy routes requests to the right provider based on catalog item metadata or provider_hints. The SP itself is backend-agnostic — it doesn't need to know which backend kweb is configured for.
Will update the proposal to make the status mapping explicit, clarify multi-backend support, and document the multi-backend deployment pattern.
| `error`). Should the SP implement per-backend mapping tables, or document | ||
| libvirt as the only supported backend for v1? | ||
|
|
||
| 3. **Cluster type `kind`.** kweb's `swagger.yml` lists `kind` as a valid cluster |
There was a problem hiding this comment.
this could be resolved by my answer for question #1. If we pin to specific version we only support what works for that version. Once fixes are available we can bump SP version with new kcli.
There was a problem hiding this comment.
Makes sense in principle. Since kcli doesn't have semver releases (see response above), we'll track validated git commits instead. The effect is the same: each SP release is tested against a known kweb state, and we bump when upstream fixes land.
|
|
||
| ## Summary | ||
|
|
||
| The kcli Service Provider is a DCM Service Provider that manages virtual |
There was a problem hiding this comment.
I like the idea of service provider which supports more than one service type. Potentially this could simplify how we manage SPs and reduce the footprint.
There was a problem hiding this comment.
Thanks! The kcli SP is a single Go binary handling both vm and cluster service types against one kweb backend, which keeps the deployment footprint minimal.
There was a problem hiding this comment.
Today the registration must be done once per service type reg for clear endpoint separation and to avoid complex capability matrices, with its own URL and provider name. VM traffic and cluster traffic don’t hit the same path (e.g. …/vm vs …/cluster, and the service health URLs), so DCM ends up with "two provider rows" even though it’s one process. Even the re-registration is easy since it impacts only one "row". If we wanted one registration call that lists several types at once, we’d need to define how that works (API shape, registry, how SPM picks the right base URL). That’s a separate change from this proposal.
There was a problem hiding this comment.
Agreed. The current dual registration (one per service type) is correct and works well. The proposal documents it as the expected approach, not a workaround. Thanks for the clear explanation of why this is the right design.
| with a standalone kweb instance, enabling DCM to provision infrastructure on any | ||
| hypervisor backend that kcli supports (primarily libvirt/KVM for homelab use). | ||
|
|
||
| Because DCM registration is per service type, the kcli SP registers **twice** |
There was a problem hiding this comment.
I think we should change this and allow registering for multiple service types.
There was a problem hiding this comment.
Agreed. Today the SPM API requires one registration per service type, so the kcli SP registers twice (kcli-vm and kcli-cluster). This works but is a workaround.
A future SPM enhancement to support multi-service-type registration in a single call would be the right fix — it would benefit any SP that handles multiple resource types.
Will update the proposal to document dual registration as a known limitation and reference this as a candidate SPM enhancement.
|
|
||
| { | ||
| "spec": { | ||
| "service_type": "cluster", |
There was a problem hiding this comment.
do we want to define kcli SP specific service type or should we use existing type?
There was a problem hiding this comment.
Same answer as above — existing types (`vm`, `cluster`) for consistency with other SPs.
| by anyone who can reach it. Additionally, kweb exposes cluster-admin kubeconfigs | ||
| via `GET /kubes/{name}/kubeconfig` without any access control. | ||
|
|
||
| **Mitigation:** The homelab/dev/test deployment model assumes a **trusted |
There was a problem hiding this comment.
reasonable approach to me. This SP would be non-production. We need to document as such
There was a problem hiding this comment.
Agreed. Will add a prominent Production Readiness disclaimer to the proposal and to the SP repository's README and docs. Something along the lines of:
This service provider is not intended for production use. It is designed for development, testing, and homelab environments. kweb has no authentication, no TLS, no rate limiting, and no SLA guarantees. The kcli SP inherits these limitations. For production workloads, use the KubeVirt SP (VMs) or ACM Cluster SP (clusters) instead.
There was a problem hiding this comment.
I think we also need to define how to manage the non production SPs. Do we want to have a community sp repo, for example (outside dcm-project org)?
Not for this PR obviously, but I think we may need to provide a guide/ruleset for the developers. I'm also thinking about authN/authZ, how (and if) we want to verify if an SPs is safe (whatever it means).
Just random doubts :)
@ygalblum wdyt?
There was a problem hiding this comment.
@gciavarrini re: this vs outside dcm-project org repositories for community, for me the key question is whether DCM is something by itself, or DCM is the upstream for some RH product. If DCM is an upstream, community SPs could live in dcm-project org. But if DCM is its own "downstream", then a different org is probably better.
|
|
||
| #### kweb Credential Exposure | ||
|
|
||
| **Risk:** kweb's `GET /kubes/{name}/kubeconfig` returns raw cluster-admin |
There was a problem hiding this comment.
we could provide an endpoint for ssh/vnc access. Not sure why we need to expose credentials via the api.
There was a problem hiding this comment.
Good point. The v1 SP does not proxy or expose VNC passwords or kubeconfig credentials today.
For VMs: the SP already surfaces the VM's IP address in the `GET /vms/{id}` response. We'll also add the default SSH user (e.g. `fedora`, `core`, `centos` — returned by kweb) so users know how to connect (`ssh fedora@192.168.x.x`). No raw credentials exposed.
For clusters: as noted above, we'll follow the ACM SP pattern — embed base64-encoded kubeconfig in the `GET /clusters/{id}` response when status is `RUNNING`.
Console/VNC access via a `GET /vms/{id}/console` endpoint (returning structured connection info with time-limited tokens and audit logging) is a good v2 candidate. For v1, kweb's console mechanism requires WebSocket proxying (`websockify`) which is a significant scope increase.
Note: the KubeVirt SP currently doesn't return IP, credentials, or console access either — so the kcli SP will actually be ahead on this front.
There was a problem hiding this comment.
@pkliczewski currently DCM does not provide this. For VMs the users can provide a SSH public key to inject and DCM will return the machine's IP address for the user to connect. For clusters, DCM returns the API and kubeconfig.
DCM is focused on managing the resources. I don't see it handling SSH or VNC anytime soon
|
|
||
| #### Polling Latency vs. Informer-Based Providers | ||
|
|
||
| **Risk:** Other DCM providers use Kubernetes informers for near-real-time status |
There was a problem hiding this comment.
I think this is expected. Not all SPs will be k8s/ocp based
There was a problem hiding this comment.
Exactly. Thanks for confirming — good to know this is expected for non-K8s/OCP providers. The poll interval is configurable via `MONITOR_POLL_INTERVAL` for tighter feedback in CI/testing.
There was a problem hiding this comment.
Yes, it's even defined as an option in the Service Provider Status Report Implementation enhancement document: https://github.com/dcm-project/enhancements/blob/main/enhancements/service-provider-status-report-implementation/service-provider-status-report-implementation.md#pattern-b-polling
|
|
||
| #### Cons | ||
|
|
||
| - **Breaks DCM provider conventions.** No existing DCM SP shells out to a CLI. |
There was a problem hiding this comment.
I think this is an implementation detail. It is up to dev to implement SP in the way they want. Always there are tradeoffs
There was a problem hiding this comment.
Agreed, included for transparency. One additional note worth mentioning: kcli previously offered a gRPC API, but it was deprecated in favor of the kweb REST API. CLI wrapping would be the only other integration path, and it's more fragile due to unstructured text output and version-sensitive parsing. kweb was the natural choice.
There was a problem hiding this comment.
I'm assuming the CLI uses kcli's REST APIs. So, writing a REST wrapper for a CLI that uses REST seems a bit redandunt.
There was a problem hiding this comment.
@ygalblum Both kcli (CLI) and kweb (REST API) are frontends to the same kvirt library
There was a problem hiding this comment.
I see. Anyhow, I think using the REST API is cleaner than running CLI from code. So, we're good
- Resolve Q1 (version pinning): track kweb git commit per SP release
- Resolve Q2 (multi-backend status): SP supports all kcli backends
- Add Production Readiness disclaimer section
- Clarify kweb configuration (SP uses KWEB_URL, no kcli dependency)
- Document multi-backend deployment pattern with Rego routing
- Document kubeconfig embedding (ACM SP pattern) for cluster access
- Document VM access: ip + ssh_user in GET /vms/{id} response
- Note dual registration as SPM limitation, propose enhancement
- Add gRPC deprecation note to CLI wrapping alternative
Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com>
Made-with: Cursor
- Change cluster ready status from ACTIVE to READY (matches ACM SP) - Expand kweb default backend docs: environment-dependent behavior (libvirt, KubeVirt in-pod, macOS Homebrew, or exit) - Multi-backend table: show separate VM and cluster provider names - Attribute gRPC deprecation to kcli maintainer (Karim Boumedhel) - Update graduation criteria with full status vocabulary Signed-off-by: Pedro Garcia Quiles <pgarciaq@redhat.com> Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com> Made-with: Cursor
Implementation updateHi @pkliczewski — thanks again for the thorough review. All the enhancements we committed to in the review comments have now been implemented in both the proposal and the codebase. Here's a summary: Proposal updates (this PR)
Implementation (dcm-kcli-provider)All the above changes are implemented and pushed to pgarciaq/dcm-kcli-provider:
|
…outes Add implementation history entries for April 23-25: - Kubeconfig and ssh_user in API responses - mergeKcliHints() for forwarding provider_hints.kcli params to kweb - Traefik routes example and combined Rego policy - Cluster catalog item with image and node count fields - v0.1.1 release Also documents the node OS image override mechanism and additional kweb parameter forwarding in the cluster creation section. Made-with: Cursor Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com>
gciavarrini
left a comment
There was a problem hiding this comment.
nit: PR description stated 148+ specs while the file says 141, i would suggest to avoid such scrict description just to reduce the confusion :)
|
|
||
| The kcli Service Provider is a DCM Service Provider that manages virtual | ||
| machines and Kubernetes clusters through | ||
| [kcli](https://github.com/karmab/kcli)'s HTTP API (kweb). It is designed for |
There was a problem hiding this comment.
let's add the link to kweb too https://kcli.readthedocs.io/en/latest/index.html#kweb
There was a problem hiding this comment.
Good catch — done, added the kweb link. Thanks!
|
|
||
| ## Summary | ||
|
|
||
| The kcli Service Provider is a DCM Service Provider that manages virtual |
There was a problem hiding this comment.
Today the registration must be done once per service type reg for clear endpoint separation and to avoid complex capability matrices, with its own URL and provider name. VM traffic and cluster traffic don’t hit the same path (e.g. …/vm vs …/cluster, and the service health URLs), so DCM ends up with "two provider rows" even though it’s one process. Even the re-registration is easy since it impacts only one "row". If we wanted one registration call that lists several types at once, we’d need to define how that works (API shape, registry, how SPM picks the right base URL). That’s a separate change from this proposal.
| The kcli SP must successfully register with DCM for each service type it | ||
| provides. During startup, after the HTTP server is ready, the SP uses the DCM | ||
| registration client to send two requests to the SP API registration endpoint: | ||
| `POST /api/v1alpha1/providers`. |
There was a problem hiding this comment.
POST /api/v1/providers
There was a problem hiding this comment.
You're right — the other enhancement docs (sp-registration-flow, kubevirt-sp) consistently use POST /api/v1/providers. Fixed. Thanks!
- Add hyperlink to kweb docs in the Summary section - Fix registration endpoint from /api/v1alpha1/providers to /api/v1/providers to match the sp-registration-flow and kubevirt-sp enhancement docs Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Per review feedback — the exact number goes stale as tests are added. Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
@gciavarrini re: the spec-count nit — good catch, the number was stale in both the PR description and the document. I've dropped the exact count from both; the text now just says "Ginkgo specs across 8 suites" without a number, so it won't go stale again as tests are added. |
Summary
Enhancement proposal for the kcli Service Provider — the first
non-Kubernetes DCM service provider. It manages VMs and Kubernetes
clusters through kweb (kcli's HTTP
API), targeting development, testing, and homelab environments.
Implementation: pgarciaq/dcm-kcli-provider
Container image: quay.io/pgarciaq/dcm-kcli-provider
Important
This SP is intended for development and testing purposes only, not for production.
kcli is a community tool that Red Hat does not provide support for.
Accordingly, this SP should be considered community-only. Despite this scope limitation, the kcli SP
is architecturally significant as a reference implementation: it demonstrates DCM's ability to provision
resources through non-Kubernetes execution planes, including third-party Kubernetes clusters (k3s,
microshift, generic), cloud VMs, and bare-metal/libvirt hosts — capabilities not covered by any
production-targeted SP today.
What makes this SP unique
The kcli SP introduces several patterns not found in existing DCM
service providers (KubeVirt SP, k8s-container SP, ACM Cluster SP):
Non-Kubernetes execution plane. First DCM SP that integrates via
a standalone HTTP API (kweb) rather than the Kubernetes API. No
kubeconfig, no client-go, no CRDs. Enables DCM on bare-metal/libvirt
without a management cluster.
Dual service-type registration. One binary registers as both
vmand
clusterproviders with separate endpoints and provider IDs. Allother SPs register a single service type.
Name-prefix identity (not K8s labels). Uses
dcm-prefix on kcliresource names + bbolt ID mapping instead of
dcm.project/*labels onKubernetes objects.
Embedded state store (bbolt). Authoritative ID-to-name mapping
lives in a local embedded database, not in Kubernetes etcd. Enables
orphan detection and crash recovery without external dependencies.
Polling-based status model. Contrasts with informer-driven
KubeVirt/k8s-container SPs. Includes debouncing and cluster creation
timeout to handle kweb's async cluster provisioning.
Profile resolution via
provider_hints. Explicit precedence chain(
provider_hints.kcli.profile→guest_os.type→ default) withruntime validation against live kweb profile cache. Fills a gap where
catalog specs don't map directly to kcli profiles.
Cluster type via
provider_hints.kcli.cluster_type. The catalogClusterSpechas nocluster_typefield;provider_hintsis theonly mechanism to select k3s/generic/openshift/microshift/hypershift.
Upstream error normalization. kweb returns 2xx with failure JSON,
HTML error pages, and empty bodies. The SP normalizes all of these
into consistent RFC 7807 responses — a challenge unique to integrating
with a non-API-first upstream.
Health = downstream dependency. SP health is downstream-dependent
(probes kweb
/host), unlike KubeVirt/k8s-container which reportself-health only.
Homelab-first design. Intentionally scoped for dev/test/homelab.
Trusted-network assumption, single-replica, embedded store — all
deliberate trade-offs that simplify deployment without compromising
the DCM integration contract.
Implementation status
POST ?id=,DELETE /{id},GET /health)design) with all critical/high findings fixed
--raceScreenshots
E2E test walkthrough from the DCM UI on an Apollo hypervisor running the full DCM stack with the kcli SP.
Providers
Both kcli providers (VM and cluster) registered and in ready state:
Cluster provider configuration — dual registration with separate endpoints:
VM provider configuration:
Policies
Rego policy routing all VM requests to the kcli-vm provider:
Service types
DCM service type registry — kcli registers as both
vmandcluster:Catalog items
Catalog items for Fedora VM, K3s Cluster, and Pet Clinic:
Fedora VM catalog item — editable fields for OS image, memory, and vCPUs:
Instances and Resources
Catalog item instance created from the Fedora VM template:
The resulting resource provisioned by kcli-vm, status APPROVED:
Review checklist