A collection of independent, self-contained modules covering how to package, deploy, serve, and operate machine-learning models on containers and Kubernetes. There is no shared build — each top-level directory is its own project with its own README and tooling. Pick the module you need and follow its README.
| Module | What it is | Start here |
|---|---|---|
| Model-Deployment/ | Flagship Helm chart + CI/CD implementing two ML deployment patterns (deploy-code / deploy-models): independent code/model lifecycles, catalog-segregated model stores, validation/compare gates with SLA load testing, online evaluation, scheduled training jobs, and real-time rollout strategies. |
Quick start |
| Helm-Chart/ | Hardened production serving chart (mychart) with per-env values, security context, HPA/PDB, topology spread, and operator scripts for deploy/monitor/rollback. |
Quick reference |
| LLM-Inference-vLLM/ | FastAPI + vLLM LLM-serving app with a CPU/mock mode, Prometheus metrics, and a load-test benchmark client. | Quick start |
| Kubernetes/ | Standalone reference manifests (voting-app, cronjob, local model deploy/service, shadow-ingress). | — |
| Docker/ | Docker + ML reference notes, a Dockerfile, and docker-compose. | — |
| Xinference/ | Xinference notebook and notes. | — |
| OpenClaw/ | Small agent Dockerfile and notes. | — |
The Model-Deployment module centers on the idea that code and models have independent lifecycles:
- deploy-code (default) — promote one immutable container image dev→staging→prod; pull the model at runtime by version, updated independently of code. New-model validation + comparison happen in production (canary).
- deploy-models — promote the model artifact through the environments, validated in staging before production; inference/monitoring code rides its own deploy-code track. Suited to one-off / expensive-training models.
See Model-Deployment/README.md for the full guide, and docs/superpowers/ for the design spec and implementation plan behind it.
Model-Deployment/test.sh # render checks → local cluster → install → helm test → CronJob
Model-Deployment/test.sh --render # offline render + lint only (no cluster)Google Colab is useful for checking CUDA and PyTorch GPU access, but it is not a
full Linux GPU host for validating Docker daemon changes such as
nvidia-ctk runtime configure --runtime=docker and systemctl restart docker.
Use a local Linux NVIDIA machine or GPU VM for the full NVIDIA Container Toolkit
Docker runtime test.
In a Colab notebook, enable a GPU runtime, then run:
!nvidia-smiCheck PyTorch CUDA access:
import torch
print(torch.__version__)
print(torch.version.cuda)
print(torch.cuda.is_available())
if torch.cuda.is_available():
print(torch.cuda.get_device_name(0))
x = torch.randn(1024, 1024, device="cuda")
print((x @ x).shape)You can also try building the CUDA 12.4.1 Docker image in Colab:
!apt-get update -qq
!apt-get install -y -qq docker.io
!nohup dockerd > /tmp/dockerd.log 2>&1 &
!sleep 10
!docker info
%cd /content/ML-Model-Deployment
!docker build -t gpu-image-test Docker/gpu
!docker run --rm gpu-image-test python -c "import torch; print(torch.__version__, torch.version.cuda)"This final GPU-in-container check may fail in Colab because Colab usually does not expose a normal host Docker runtime:
!docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi- Work via branch + PR;
masteris the default branch. - CI/CD workflows live outside
.github/workflows/(under.github/workflows-helm-chart/andModel-Deployment/cicd/), so they are run manually rather than auto-triggered. - See CLAUDE.md for module commands, architecture notes, and the non-obvious conventions in one place.