Enhancement proposal: kcli Service Provider by pgarciaq · Pull Request #43 · dcm-project/enhancements

pgarciaq · 2026-04-22T17:11:58Z

Summary

Enhancement proposal for the kcli Service Provider — the first
non-Kubernetes DCM service provider. It manages VMs and Kubernetes
clusters through kweb (kcli's HTTP
API), targeting development, testing, and homelab environments.

Implementation: pgarciaq/dcm-kcli-provider
Container image: quay.io/pgarciaq/dcm-kcli-provider

Important

This SP is intended for development and testing purposes only, not for production.
kcli is a community tool that Red Hat does not provide support for.
Accordingly, this SP should be considered community-only. Despite this scope limitation, the kcli SP
is architecturally significant as a reference implementation: it demonstrates DCM's ability to provision
resources through non-Kubernetes execution planes, including third-party Kubernetes clusters (k3s,
microshift, generic), cloud VMs, and bare-metal/libvirt hosts — capabilities not covered by any
production-targeted SP today.

What makes this SP unique

The kcli SP introduces several patterns not found in existing DCM
service providers (KubeVirt SP, k8s-container SP, ACM Cluster SP):

Non-Kubernetes execution plane. First DCM SP that integrates via
a standalone HTTP API (kweb) rather than the Kubernetes API. No
kubeconfig, no client-go, no CRDs. Enables DCM on bare-metal/libvirt
without a management cluster.
Dual service-type registration. One binary registers as both vm
and cluster providers with separate endpoints and provider IDs. All
other SPs register a single service type.
Name-prefix identity (not K8s labels). Uses dcm- prefix on kcli
resource names + bbolt ID mapping instead of dcm.project/* labels on
Kubernetes objects.
Embedded state store (bbolt). Authoritative ID-to-name mapping
lives in a local embedded database, not in Kubernetes etcd. Enables
orphan detection and crash recovery without external dependencies.
Polling-based status model. Contrasts with informer-driven
KubeVirt/k8s-container SPs. Includes debouncing and cluster creation
timeout to handle kweb's async cluster provisioning.
Profile resolution via provider_hints. Explicit precedence chain
(provider_hints.kcli.profile → guest_os.type → default) with
runtime validation against live kweb profile cache. Fills a gap where
catalog specs don't map directly to kcli profiles.
Cluster type via provider_hints.kcli.cluster_type. The catalog
ClusterSpec has no cluster_type field; provider_hints is the
only mechanism to select k3s/generic/openshift/microshift/hypershift.
Upstream error normalization. kweb returns 2xx with failure JSON,
HTML error pages, and empty bodies. The SP normalizes all of these
into consistent RFC 7807 responses — a challenge unique to integrating
with a non-API-first upstream.
Health = downstream dependency. SP health is downstream-dependent
(probes kweb /host), unlike KubeVirt/k8s-container which report
self-health only.
Homelab-first design. Intentionally scoped for dev/test/homelab.
Trusted-network assumption, single-replica, embedded store — all
deliberate trade-offs that simplify deployment without compromising
the DCM integration contract.

Implementation status

Full SPM generic resource protocol (POST ?id=, DELETE /{id},
GET /health)
E2E tested on Apollo hypervisor (full DCM stack + kcli v99.0)
Adversarial review completed (security, correctness, operations,
design) with all critical/high findings fixed
Ginkgo specs across 8 suites, all passing with --race

Screenshots

E2E test walkthrough from the DCM UI on an Apollo hypervisor running the full DCM stack with the kcli SP.

Providers

Both kcli providers (VM and cluster) registered and in ready state:

Cluster provider configuration — dual registration with separate endpoints:

VM provider configuration:

Policies

Rego policy routing all VM requests to the kcli-vm provider:

Service types

DCM service type registry — kcli registers as both vm and cluster:

Catalog items

Catalog items for Fedora VM, K3s Cluster, and Pet Clinic:

Fedora VM catalog item — editable fields for OS image, memory, and vCPUs:

Instances and Resources

Catalog item instance created from the Fedora VM template:

The resulting resource provisioned by kcli-vm, status APPROVED:

Review checklist

Proposal structure follows the enhancement template
API design aligns with DCM SP contracts (SPM, health, status)
Risks and mitigations are comprehensive
Alternatives were considered and documented

Introduces the kcli SP — the first non-Kubernetes DCM service provider. It manages VMs and clusters through kweb (kcli's HTTP API), targeting development, testing, and homelab environments. Made-with: Cursor Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com>

Screenshots showing the full kcli SP integration flow through the DCM UI: providers, policies, service types, catalog items, instances, and resources. Made-with: Cursor Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com>

pkliczewski · 2026-04-24T08:35:53Z

+
+## Open Questions
+
+1. **kweb version pinning.** kweb has no versioned API contract. Should the SP


Is there any benefit of supporting multiple versions? if not we can pin SP version to a specific kcli version

Good question. kcli doesn't follow conventional versioning — it's been at 99.0.0 for ~6 years, with a new RPM build on every commit (e.g. 99.0.0.git.202604230909.6534e7c). There's no "kcli 1.2" vs "kcli 1.3" to pin against.

What we can do instead: each SP release documents the kweb git commit it was validated against. For example, v0.1.0 was tested against 6534e7c. When a kweb change breaks or improves something, we bump the SP version and update the validated commit. This gives traceability without pretending kcli has semver releases.

Will update the proposal to close Open Question #1 with this approach.

pkliczewski · 2026-04-24T08:36:55Z

+   pin to a specific kcli release and test against it, or attempt to support
+   multiple kweb versions with feature detection?
+
+2. **Multi-backend status mapping.** kweb VM status strings vary by backend


I think it would be good to normalize statuses passed to DCM control plane for all backends

Agreed. The SP already normalizes kweb status strings to DCM's vocabulary (RUNNING, STOPPED, PROVISIONING, ERROR). This works across all kcli backends (libvirt, vSphere, KubeVirt, AWS, Azure, OpenStack, etc.) since the SP talks to kweb over HTTP and the kweb API is backend-agnostic.

Status strings may vary by backend (e.g. libvirt returns up/down, vSphere adds suspended), but the SP handles known values and maps unknowns to ERROR.

For multi-backend deployments (e.g. one kweb on libvirt, another on AWS), the admin deploys one kweb+SP pair per backend. Each registers as service type vm (or cluster) with a unique provider name (e.g. kcli-libvirt-apollo, kcli-aws-useast). DCM's Rego policy routes requests to the right provider based on catalog item metadata or provider_hints. The SP itself is backend-agnostic — it doesn't need to know which backend kweb is configured for.

Will update the proposal to make the status mapping explicit, clarify multi-backend support, and document the multi-backend deployment pattern.

pkliczewski · 2026-04-24T08:39:00Z

+   `error`). Should the SP implement per-backend mapping tables, or document
+   libvirt as the only supported backend for v1?
+
+3. **Cluster type `kind`.** kweb's `swagger.yml` lists `kind` as a valid cluster


this could be resolved by my answer for question #1. If we pin to specific version we only support what works for that version. Once fixes are available we can bump SP version with new kcli.

Makes sense in principle. Since kcli doesn't have semver releases (see response above), we'll track validated git commits instead. The effect is the same: each SP release is tested against a known kweb state, and we bump when upstream fixes land.

pkliczewski · 2026-04-24T08:40:02Z

+
+## Summary
+
+The kcli Service Provider is a DCM Service Provider that manages virtual


I like the idea of service provider which supports more than one service type. Potentially this could simplify how we manage SPs and reduce the footprint.

Thanks! The kcli SP is a single Go binary handling both vm and cluster service types against one kweb backend, which keeps the deployment footprint minimal.

Today the registration must be done once per service type reg for clear endpoint separation and to avoid complex capability matrices, with its own URL and provider name. VM traffic and cluster traffic don’t hit the same path (e.g. …/vm vs …/cluster, and the service health URLs), so DCM ends up with "two provider rows" even though it’s one process. Even the re-registration is easy since it impacts only one "row". If we wanted one registration call that lists several types at once, we’d need to define how that works (API shape, registry, how SPM picks the right base URL). That’s a separate change from this proposal.

Agreed. The current dual registration (one per service type) is correct and works well. The proposal documents it as the expected approach, not a workaround. Thanks for the clear explanation of why this is the right design.

pkliczewski · 2026-04-24T08:42:25Z

+with a standalone kweb instance, enabling DCM to provision infrastructure on any
+hypervisor backend that kcli supports (primarily libvirt/KVM for homelab use).
+
+Because DCM registration is per service type, the kcli SP registers **twice**


I think we should change this and allow registering for multiple service types.

Agreed. Today the SPM API requires one registration per service type, so the kcli SP registers twice (kcli-vm and kcli-cluster). This works but is a workaround.

A future SPM enhancement to support multi-service-type registration in a single call would be the right fix — it would benefit any SP that handles multiple resource types.

Will update the proposal to document dual registration as a known limitation and reference this as a candidate SPM enhancement.

pkliczewski · 2026-04-24T08:57:19Z

+
+{
+  "spec": {
+    "service_type": "cluster",


do we want to define kcli SP specific service type or should we use existing type?

Same answer as above — existing types (`vm`, `cluster`) for consistency with other SPs.

pkliczewski · 2026-04-24T09:56:53Z

+by anyone who can reach it. Additionally, kweb exposes cluster-admin kubeconfigs
+via `GET /kubes/{name}/kubeconfig` without any access control.
+
+**Mitigation:** The homelab/dev/test deployment model assumes a **trusted


reasonable approach to me. This SP would be non-production. We need to document as such

Agreed. Will add a prominent Production Readiness disclaimer to the proposal and to the SP repository's README and docs. Something along the lines of:

This service provider is not intended for production use. It is designed for development, testing, and homelab environments. kweb has no authentication, no TLS, no rate limiting, and no SLA guarantees. The kcli SP inherits these limitations. For production workloads, use the KubeVirt SP (VMs) or ACM Cluster SP (clusters) instead.

I think we also need to define how to manage the non production SPs. Do we want to have a community sp repo, for example (outside dcm-project org)?
Not for this PR obviously, but I think we may need to provide a guide/ruleset for the developers. I'm also thinking about authN/authZ, how (and if) we want to verify if an SPs is safe (whatever it means).
Just random doubts :)

@ygalblum wdyt?

@gciavarrini re: this vs outside dcm-project org repositories for community, for me the key question is whether DCM is something by itself, or DCM is the upstream for some RH product. If DCM is an upstream, community SPs could live in dcm-project org. But if DCM is its own "downstream", then a different org is probably better.

pkliczewski · 2026-04-24T09:58:30Z

+
+#### kweb Credential Exposure
+
+**Risk:** kweb's `GET /kubes/{name}/kubeconfig` returns raw cluster-admin


we could provide an endpoint for ssh/vnc access. Not sure why we need to expose credentials via the api.

Good point. The v1 SP does not proxy or expose VNC passwords or kubeconfig credentials today.

For VMs: the SP already surfaces the VM's IP address in the `GET /vms/{id}` response. We'll also add the default SSH user (e.g. `fedora`, `core`, `centos` — returned by kweb) so users know how to connect (`ssh fedora@192.168.x.x`). No raw credentials exposed.

For clusters: as noted above, we'll follow the ACM SP pattern — embed base64-encoded kubeconfig in the `GET /clusters/{id}` response when status is `RUNNING`.

Console/VNC access via a `GET /vms/{id}/console` endpoint (returning structured connection info with time-limited tokens and audit logging) is a good v2 candidate. For v1, kweb's console mechanism requires WebSocket proxying (`websockify`) which is a significant scope increase.

Note: the KubeVirt SP currently doesn't return IP, credentials, or console access either — so the kcli SP will actually be ahead on this front.

@pkliczewski currently DCM does not provide this. For VMs the users can provide a SSH public key to inject and DCM will return the machine's IP address for the user to connect. For clusters, DCM returns the API and kubeconfig.
DCM is focused on managing the resources. I don't see it handling SSH or VNC anytime soon

pkliczewski · 2026-04-24T09:59:37Z

+
+#### Polling Latency vs. Informer-Based Providers
+
+**Risk:** Other DCM providers use Kubernetes informers for near-real-time status


I think this is expected. Not all SPs will be k8s/ocp based

Exactly. Thanks for confirming — good to know this is expected for non-K8s/OCP providers. The poll interval is configurable via `MONITOR_POLL_INTERVAL` for tighter feedback in CI/testing.

Yes, it's even defined as an option in the Service Provider Status Report Implementation enhancement document: https://github.com/dcm-project/enhancements/blob/main/enhancements/service-provider-status-report-implementation/service-provider-status-report-implementation.md#pattern-b-polling

pkliczewski · 2026-04-24T10:01:07Z

+
+#### Cons
+
+- **Breaks DCM provider conventions.** No existing DCM SP shells out to a CLI.


I think this is an implementation detail. It is up to dev to implement SP in the way they want. Always there are tradeoffs

Agreed, included for transparency. One additional note worth mentioning: kcli previously offered a gRPC API, but it was deprecated in favor of the kweb REST API. CLI wrapping would be the only other integration path, and it's more fragile due to unstructured text output and version-sensitive parsing. kweb was the natural choice.

I'm assuming the CLI uses kcli's REST APIs. So, writing a REST wrapper for a CLI that uses REST seems a bit redandunt.

@ygalblum Both kcli (CLI) and kweb (REST API) are frontends to the same kvirt library

I see. Anyhow, I think using the REST API is cleaner than running CLI from code. So, we're good

- Resolve Q1 (version pinning): track kweb git commit per SP release - Resolve Q2 (multi-backend status): SP supports all kcli backends - Add Production Readiness disclaimer section - Clarify kweb configuration (SP uses KWEB_URL, no kcli dependency) - Document multi-backend deployment pattern with Rego routing - Document kubeconfig embedding (ACM SP pattern) for cluster access - Document VM access: ip + ssh_user in GET /vms/{id} response - Note dual registration as SPM limitation, propose enhancement - Add gRPC deprecation note to CLI wrapping alternative Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com> Made-with: Cursor

- Change cluster ready status from ACTIVE to READY (matches ACM SP) - Expand kweb default backend docs: environment-dependent behavior (libvirt, KubeVirt in-pod, macOS Homebrew, or exit) - Multi-backend table: show separate VM and cluster provider names - Attribute gRPC deprecation to kcli maintainer (Karim Boumedhel) - Update graduation criteria with full status vocabulary Signed-off-by: Pedro Garcia Quiles <pgarciaq@redhat.com> Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com> Made-with: Cursor

pgarciaq · 2026-04-24T21:09:57Z

Implementation update

Hi @pkliczewski — thanks again for the thorough review. All the enhancements we committed to in the review comments have now been implemented in both the proposal and the codebase. Here's a summary:

Proposal updates (this PR)

Closed Open Questions Service Provider Registration Flow #1 and Add service type definition #2 — kweb version tracking via validated git commits; status normalization across all backends
Production Readiness disclaimer added prominently in Summary and throughout
kweb configuration clarified: SP has no dependency on Python/kcli; kweb's default backend is environment-dependent (libvirt if sockets exist, KubeVirt in-pod, macOS Homebrew, or exit — not always libvirt)
Multi-backend deployment pattern documented with separate VM and cluster provider names per SP instance
Dual registration documented as a workaround; suggested SPM enhancement for multi-service-type registration
Cluster status: ACTIVE → READY to align with the ACM Cluster SP
Kubeconfig embedding follows the ACM SP pattern: base64-encoded kubeconfig + api_endpoint in GET /clusters/{id} when status is READY
VM access: ip and ssh_user returned in GET /vms/{id} responses
gRPC deprecation attributed to kcli maintainer (Karim Boumedhel)
Graduation criteria updated with the full status vocabulary for both VMs and clusters

Implementation (dcm-kcli-provider)

All the above changes are implemented and pushed to pgarciaq/dcm-kcli-provider:

OpenAPI schema updated with kubeconfig, api_endpoint, ip, ssh_user fields
GetCluster embeds base64 kubeconfig when status is READY; extractAPIEndpoint follows current-context for correctness with multi-cluster kubeconfigs
GetVM and ListVMs return ip and ssh_user at the top level
Observability: slog.Warn on silent kweb failures in GetVM/GetCluster
Safety: io.LimitReader (10 MB cap) on kweb HTTP responses
6 new test cases covering error paths, base64 round-trip, and multi-cluster kubeconfig
All 141+ tests pass, go vet clean

…outes Add implementation history entries for April 23-25: - Kubeconfig and ssh_user in API responses - mergeKcliHints() for forwarding provider_hints.kcli params to kweb - Traefik routes example and combined Rego policy - Cluster catalog item with image and node count fields - v0.1.1 release Also documents the node OS image override mechanism and additional kweb parameter forwarding in the cluster creation section. Made-with: Cursor Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com>

gciavarrini

nit: PR description stated 148+ specs while the file says 141, i would suggest to avoid such scrict description just to reduce the confusion :)

gciavarrini · 2026-04-24T13:45:33Z

+
+The kcli Service Provider is a DCM Service Provider that manages virtual
+machines and Kubernetes clusters through
+[kcli](https://github.com/karmab/kcli)'s HTTP API (kweb). It is designed for


let's add the link to kweb too https://kcli.readthedocs.io/en/latest/index.html#kweb

Good catch — done, added the kweb link. Thanks!

gciavarrini · 2026-05-04T10:02:18Z

+
+## Summary
+
+The kcli Service Provider is a DCM Service Provider that manages virtual


Today the registration must be done once per service type reg for clear endpoint separation and to avoid complex capability matrices, with its own URL and provider name. VM traffic and cluster traffic don’t hit the same path (e.g. …/vm vs …/cluster, and the service health URLs), so DCM ends up with "two provider rows" even though it’s one process. Even the re-registration is easy since it impacts only one "row". If we wanted one registration call that lists several types at once, we’d need to define how that works (API shape, registry, how SPM picks the right base URL). That’s a separate change from this proposal.

gciavarrini · 2026-05-04T10:19:50Z

+The kcli SP must successfully register with DCM for each service type it
+provides. During startup, after the HTTP server is ready, the SP uses the DCM
+registration client to send two requests to the SP API registration endpoint:
+`POST /api/v1alpha1/providers`.


POST /api/v1/providers

You're right — the other enhancement docs (sp-registration-flow, kubevirt-sp) consistently use POST /api/v1/providers. Fixed. Thanks!

- Add hyperlink to kweb docs in the Summary section - Fix registration endpoint from /api/v1alpha1/providers to /api/v1/providers to match the sp-registration-flow and kubevirt-sp enhancement docs Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Per review feedback — the exact number goes stale as tests are added. Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

pgarciaq · 2026-05-04T10:57:47Z

@gciavarrini re: the spec-count nit — good catch, the number was stale in both the PR description and the document. I've dropped the exact count from both; the text now just says "Ginkgo specs across 8 suites" without a number, so it won't go stale again as tests are added.

pgarciaq requested review from Fale, croadfeldt, ebichman-1, gabriel-farache, gciavarrini, jenniferubah, machacekondra, pkliczewski and ygalblum as code owners April 22, 2026 17:11

pgarciaq added 2 commits April 22, 2026 22:58

Add DCM UI screenshots from E2E testing

b384240

Screenshots showing the full kcli SP integration flow through the DCM UI: providers, policies, service types, catalog items, instances, and resources. Made-with: Cursor Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com>

pgarciaq force-pushed the kcli-sp branch from d7e7222 to b384240 Compare April 22, 2026 20:58

pkliczewski reviewed Apr 24, 2026

View reviewed changes

pgarciaq added 2 commits April 24, 2026 21:41

pgarciaq force-pushed the kcli-sp branch from 7efe667 to ad9d89c Compare April 25, 2026 19:40

gciavarrini reviewed May 4, 2026

View reviewed changes

pgarciaq and others added 2 commits May 4, 2026 12:53

kcli-sp: drop exact spec count from Implementation Status

a54f526

Per review feedback — the exact number goes stale as tests are added. Signed-off-by: Pau Garcia Quiles <pgarciaq@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>


		## Open Questions

		1. kweb version pinning. kweb has no versioned API contract. Should the SP


		## Summary

		The kcli Service Provider is a DCM Service Provider that manages virtual


		#### kweb Credential Exposure

		Risk: kweb's `GET /kubes/{name}/kubeconfig` returns raw cluster-admin


		#### Polling Latency vs. Informer-Based Providers

		Risk: Other DCM providers use Kubernetes informers for near-real-time status


		#### Cons

		- Breaks DCM provider conventions. No existing DCM SP shells out to a CLI.

Conversation

pgarciaq commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What makes this SP unique

Implementation status

Screenshots

Providers

Policies

Service types

Catalog items

Instances and Resources

Review checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pgarciaq May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pgarciaq commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation update

Proposal updates (this PR)

Implementation (dcm-kcli-provider)

Uh oh!

pgarciaq commented Apr 22, 2026 •

edited

Loading

pgarciaq May 1, 2026 •

edited

Loading

pgarciaq commented Apr 24, 2026 •

edited

Loading

pgarciaq May 4, 2026 •

edited

Loading

pgarciaq May 4, 2026 •

edited

Loading