ML Model Deployment

A collection of independent, self-contained modules covering how to package, deploy, serve, and operate machine-learning models on containers and Kubernetes. There is no shared build — each top-level directory is its own project with its own README and tooling. Pick the module you need and follow its README.

Modules

Module	What it is	Start here
Model-Deployment/	Flagship Helm chart + CI/CD implementing two ML deployment patterns (`deploy-code` / `deploy-models`): independent code/model lifecycles, catalog-segregated model stores, validation/compare gates with SLA load testing, online evaluation, scheduled training jobs, and real-time rollout strategies.	Quick start
Helm-Chart/	Hardened production serving chart (`mychart`) with per-env values, security context, HPA/PDB, topology spread, and operator scripts for deploy/monitor/rollback.	Quick reference
LLM-Inference-vLLM/	FastAPI + vLLM LLM-serving app with a CPU/mock mode, Prometheus metrics, and a load-test benchmark client.	Quick start
Kubernetes/	Standalone reference manifests (voting-app, cronjob, local model deploy/service, shadow-ingress).	—
Docker/	Docker + ML reference notes, a Dockerfile, and docker-compose.	—
Xinference/	Xinference notebook and notes.	—
OpenClaw/	Small agent Dockerfile and notes.	—

The two deployment patterns

The Model-Deployment module centers on the idea that code and models have independent lifecycles:

deploy-code (default) — promote one immutable container image dev→staging→prod; pull the model at runtime by version, updated independently of code. New-model validation + comparison happen in production (canary).
deploy-models — promote the model artifact through the environments, validated in staging before production; inference/monitoring code rides its own deploy-code track. Suited to one-off / expensive-training models.

See Model-Deployment/README.md for the full guide, and docs/superpowers/ for the design spec and implementation plan behind it.

Try the flagship module in one command

Model-Deployment/test.sh            # render checks → local cluster → install → helm test → CronJob
Model-Deployment/test.sh --render   # offline render + lint only (no cluster)

Test GPU access in Colab

Google Colab is useful for checking CUDA and PyTorch GPU access, but it is not a full Linux GPU host for validating Docker daemon changes such as nvidia-ctk runtime configure --runtime=docker and systemctl restart docker. Use a local Linux NVIDIA machine or GPU VM for the full NVIDIA Container Toolkit Docker runtime test.

In a Colab notebook, enable a GPU runtime, then run:

!nvidia-smi

Check PyTorch CUDA access:

import torch

print(torch.__version__)
print(torch.version.cuda)
print(torch.cuda.is_available())

if torch.cuda.is_available():
    print(torch.cuda.get_device_name(0))
    x = torch.randn(1024, 1024, device="cuda")
    print((x @ x).shape)

You can also try building the CUDA 12.4.1 Docker image in Colab:

!apt-get update -qq
!apt-get install -y -qq docker.io
!nohup dockerd > /tmp/dockerd.log 2>&1 &
!sleep 10
!docker info

%cd /content/ML-Model-Deployment
!docker build -t gpu-image-test Docker/gpu
!docker run --rm gpu-image-test python -c "import torch; print(torch.__version__, torch.version.cuda)"

This final GPU-in-container check may fail in Colab because Colab usually does not expose a normal host Docker runtime:

!docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

Repo conventions

Work via branch + PR; master is the default branch.
CI/CD workflows live outside .github/workflows/ (under .github/workflows-helm-chart/ and Model-Deployment/cicd/), so they are run manually rather than auto-triggered.
See CLAUDE.md for module commands, architecture notes, and the non-obvious conventions in one place.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Model Deployment

Modules

The two deployment patterns

Try the flagship module in one command

Test GPU access in Colab

Repo conventions

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
.github/workflows-helm-chart		.github/workflows-helm-chart
Docker		Docker
Helm-Chart		Helm-Chart
Kubernetes		Kubernetes
LLM-Inference-vLLM		LLM-Inference-vLLM
Model-Deployment		Model-Deployment
OpenClaw		OpenClaw
Xinference		Xinference
docs/superpowers		docs/superpowers
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ML Model Deployment

Modules

The two deployment patterns

Try the flagship module in one command

Test GPU access in Colab

Repo conventions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages