docs: add CoreWeave full stack examples#1
Conversation
Reviewer's GuideThis PR adds a comprehensive set of documentation and end-to-end examples demonstrating CoreWeave’s full-stack integration—covering SUNK, Tensorizer, vLLM, observability, CI/CD, and security—along with Helm, Knative, and ArgoCD deployment manifests and a Python serialization example. Sequence diagram for tensorized model serialization, upload, serving, and lazy loadingsequenceDiagram
actor User
participant TensorizerScript
participant S3
participant HTTPServer
participant vLLM
User->>TensorizerScript: Run serialize_and_load.py
TensorizerScript->>TensorizerScript: serialize(model_id, out_path)
TensorizerScript->>S3: upload_to_s3(out_path, bucket, key) (optional)
alt Serve over HTTP
TensorizerScript->>HTTPServer: serve_file(out_path, port)
end
TensorizerScript->>TensorizerScript: load(uri, device, num_readers)
TensorizerScript->>vLLM: vLLM loads tensorized weights via TensorDeserializer
vLLM->>User: Model ready for inference
Class diagram for Tensorizer serialization and deserialization exampleclassDiagram
class TensorSerializer {
+__init__(out_path)
+write_module(model)
}
class TensorDeserializer {
+__init__(uri, device, lazy_load, num_readers)
+load_into_module(model)
}
class AutoModelForCausalLM {
+from_pretrained(model_id)
}
class serialize_and_load {
+serialize(model_id, out_path)
+upload_to_s3(path, bucket, key)
+serve_file(path, port)
+load(uri, device, num_readers)
+main()
}
serialize_and_load --> TensorSerializer : uses
serialize_and_load --> TensorDeserializer : uses
serialize_and_load --> AutoModelForCausalLM : uses
TensorSerializer --> AutoModelForCausalLM : serializes
TensorDeserializer --> AutoModelForCausalLM : loads
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
WalkthroughAdds CI/CD via GitHub Actions to build, scan, push, and deploy a vLLM image with Helm. Introduces Helm chart, Argo CD app, and Knative Service. Provides extensive docs (overview, SUNK, Tensorizer, vLLM, observability, CI/CD, security) and runnable examples (tensorizer script, vLLM smoke test, SUNK Slurm pod, Grafana walkthrough). Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Dev as Developer
participant GH as GitHub Actions
participant D as Docker/Buildx
participant S as Trivy
participant R as GHCR
participant H as Helm
participant K as Kubernetes
Dev->>GH: Push to main
GH->>D: Build image (vLLM)
D-->>GH: Image built (tag: example/vllm:${sha})
GH->>S: Scan image
S-->>GH: Scan results (pass)
GH->>R: Login & push image
GH->>H: helm upgrade --install tensorizer-vllm
H->>K: Apply Deployment/Service/Ingress
K-->>H: Resources ready
H-->>GH: Deploy complete
sequenceDiagram
autonumber
actor U as User
participant I as Ingress/Knative
participant Svc as Service
participant Pod as vLLM Pod
note over Pod: vllm serve --tensorizer<br/>--model s3://... or http://...
U->>I: HTTP request (/generate)
I->>Svc: Route request
Svc->>Pod: Forward to port 8000
Pod-->>U: Response (tokens, metrics at /metrics)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
Blocking issues:
- An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload. (link)
- An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload. (link)
- An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload. (link)
General comments:
- In examples/tensorizer/serialize_and_load.py, avoid using os.chdir() to set the serve directory and instead supply the directory parameter to SimpleHTTPRequestHandler to prevent global working-directory side effects.
- Add logic to gracefully shut down the HTTP server thread after load completes in serialize_and_load.py to avoid leaving orphan background threads.
- Verify that all relative links in the new docs (for example in schedule-k8s-with-slurm.md) resolve correctly in the rendered site to prevent broken navigation.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In examples/tensorizer/serialize_and_load.py, avoid using os.chdir() to set the serve directory and instead supply the directory parameter to SimpleHTTPRequestHandler to prevent global working-directory side effects.
- Add logic to gracefully shut down the HTTP server thread after load completes in serialize_and_load.py to avoid leaving orphan background threads.
- Verify that all relative links in the new docs (for example in schedule-k8s-with-slurm.md) resolve correctly in the rendered site to prevent broken navigation.
## Individual Comments
### Comment 1
<location> `examples/tensorizer/serialize_and_load.py:31` </location>
<code_context>
+ s3.upload_file(path, bucket, key)
+
+
+def serve_file(path: str, port: int) -> threading.Thread:
+ directory = os.path.dirname(os.path.abspath(path))
+ os.chdir(directory)
+ server = ThreadingHTTPServer(("0.0.0.0", port), SimpleHTTPRequestHandler)
+ thread = threading.Thread(target=server.serve_forever, daemon=True)
</code_context>
<issue_to_address>
Changing the working directory with os.chdir may have unintended side effects.
Using os.chdir can impact other threads or components. To avoid this, use the 'directory' parameter in SimpleHTTPRequestHandler if available.
</issue_to_address>
### Comment 2
<location> `examples/tensorizer/serialize_and_load.py:24` </location>
<code_context>
+ serializer.write_module(model)
+
+
+def upload_to_s3(path: str, bucket: str, key: str) -> None:
+ import boto3
+
+ s3 = boto3.client("s3")
+ s3.upload_file(path, bucket, key)
+
+
</code_context>
<issue_to_address>
No error handling for S3 upload failures.
Add error handling to manage S3 upload exceptions and consider implementing retry logic for transient errors.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
def upload_to_s3(path: str, bucket: str, key: str) -> None:
import boto3
s3 = boto3.client("s3")
s3.upload_file(path, bucket, key)
=======
import time
def upload_to_s3(path: str, bucket: str, key: str, max_retries: int = 3, delay: float = 2.0) -> None:
import boto3
import botocore.exceptions
s3 = boto3.client("s3")
attempt = 0
while attempt < max_retries:
try:
s3.upload_file(path, bucket, key)
print(f"Successfully uploaded {path} to s3://{bucket}/{key}")
return
except botocore.exceptions.BotoCoreError as e:
attempt += 1
print(f"S3 upload failed (attempt {attempt}/{max_retries}): {e}")
if attempt < max_retries:
print(f"Retrying in {delay} seconds...")
time.sleep(delay)
else:
print("Max retries reached. Upload failed.")
raise
except Exception as e:
print(f"Unexpected error during S3 upload: {e}")
raise
>>>>>>> REPLACE
</suggested_fix>
### Comment 3
<location> `helm/tensorizer-vllm/templates/service.yaml:9` </location>
<code_context>
+ selector:
+ app: {{ .Release.Name }}
+ ports:
+ - port: 80
+ targetPort: 8000
</code_context>
<issue_to_address>
Mapping port 80 to 8000 may cause confusion or conflicts.
To prevent potential port conflicts, make the service port configurable or choose a less common default port.
</issue_to_address>
## Security Issues
### Issue 1
<location> `.github/workflows/build-and-deploy.yml:13` </location>
<issue_to_address>
**security (yaml.github-actions.security.third-party-action-not-pinned-to-commit-sha):** An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload.
*Source: opengrep*
</issue_to_address>
### Issue 2
<location> `.github/workflows/build-and-deploy.yml:17` </location>
<issue_to_address>
**security (yaml.github-actions.security.third-party-action-not-pinned-to-commit-sha):** An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload.
*Source: opengrep*
</issue_to_address>
### Issue 3
<location> `.github/workflows/build-and-deploy.yml:21` </location>
<issue_to_address>
**security (yaml.github-actions.security.third-party-action-not-pinned-to-commit-sha):** An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def serve_file(path: str, port: int) -> threading.Thread: | ||
| directory = os.path.dirname(os.path.abspath(path)) | ||
| os.chdir(directory) |
There was a problem hiding this comment.
issue (bug_risk): Changing the working directory with os.chdir may have unintended side effects.
Using os.chdir can impact other threads or components. To avoid this, use the 'directory' parameter in SimpleHTTPRequestHandler if available.
| def upload_to_s3(path: str, bucket: str, key: str) -> None: | ||
| import boto3 | ||
|
|
||
| s3 = boto3.client("s3") | ||
| s3.upload_file(path, bucket, key) |
There was a problem hiding this comment.
suggestion: No error handling for S3 upload failures.
Add error handling to manage S3 upload exceptions and consider implementing retry logic for transient errors.
| def upload_to_s3(path: str, bucket: str, key: str) -> None: | |
| import boto3 | |
| s3 = boto3.client("s3") | |
| s3.upload_file(path, bucket, key) | |
| import time | |
| def upload_to_s3(path: str, bucket: str, key: str, max_retries: int = 3, delay: float = 2.0) -> None: | |
| import boto3 | |
| import botocore.exceptions | |
| s3 = boto3.client("s3") | |
| attempt = 0 | |
| while attempt < max_retries: | |
| try: | |
| s3.upload_file(path, bucket, key) | |
| print(f"Successfully uploaded {path} to s3://{bucket}/{key}") | |
| return | |
| except botocore.exceptions.BotoCoreError as e: | |
| attempt += 1 | |
| print(f"S3 upload failed (attempt {attempt}/{max_retries}): {e}") | |
| if attempt < max_retries: | |
| print(f"Retrying in {delay} seconds...") | |
| time.sleep(delay) | |
| else: | |
| print("Max retries reached. Upload failed.") | |
| raise | |
| except Exception as e: | |
| print(f"Unexpected error during S3 upload: {e}") | |
| raise |
| - port: 80 | ||
| targetPort: 8000 |
There was a problem hiding this comment.
suggestion (bug_risk): Mapping port 80 to 8000 may cause confusion or conflicts.
To prevent potential port conflicts, make the service port configurable or choose a less common default port.
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - name: Set up Docker | ||
| uses: docker/setup-buildx-action@v2 |
There was a problem hiding this comment.
security (yaml.github-actions.security.third-party-action-not-pinned-to-commit-sha): An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload.
Source: opengrep
| - name: Build | ||
| run: docker build -t example/vllm:${{ github.sha }} . | ||
| - name: Scan | ||
| uses: aquasecurity/trivy-action@0.20.0 |
There was a problem hiding this comment.
security (yaml.github-actions.security.third-party-action-not-pinned-to-commit-sha): An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload.
Source: opengrep
| with: | ||
| image-ref: example/vllm:${{ github.sha }} | ||
| - name: Login | ||
| uses: docker/login-action@v3 |
There was a problem hiding this comment.
security (yaml.github-actions.security.third-party-action-not-pinned-to-commit-sha): An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload.
Source: opengrep
There was a problem hiding this comment.
Actionable comments posted: 24
🧹 Nitpick comments (43)
k8s/knative-service.yaml (4)
13-13: Be explicit about the entrypointAssuming the image ENTRYPOINT is
vllmmay fail. Set command to["vllm"]soargs: ["serve", ...]is valid.- args: ["serve", "--model", "s3://my-bucket/models/tiny-gpt2.tensors", "--tensorizer"] + command: ["vllm"] + args: ["serve", "--model", "s3://my-bucket/models/tiny-gpt2.tensors", "--tensorizer"]
14-15: Add health probes to avoid flapping revisionsKnative benefits from readiness/liveness probes; vLLM exposes an HTTP endpoint—wire probes to reduce cold/fail traffic.
ports: - containerPort: 8000 + readinessProbe: + httpGet: { path: /health, port: 8000 } + initialDelaySeconds: 10 + periodSeconds: 5 + livenessProbe: + httpGet: { path: /health, port: 8000 } + initialDelaySeconds: 30 + periodSeconds: 10Adjust path if your vLLM build uses a different health route.
8-9: Cold starts likely with minScale=0Scale-to-zero is fine for cost, but adds latency on first request. For demos/benchmarks set minScale=1.
- autoscaling.knative.dev/minScale: "0" + autoscaling.knative.dev/minScale: "1"
11-15: Harden the podConsider non-root, read-only FS, and dropped capabilities.
- image: vllm/vllm:latest + securityContext: + runAsNonRoot: true + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + capabilities: { drop: ["ALL"] }examples/observability/grafana/README.md (2)
12-15: Make the “open Grafana” step cross‑platform
openworks on macOS only. Prefer echoing the URL and try xdg-open/open/start.```bash -docker compose up -d -open http://localhost:3000 +docker compose up -d +echo "Grafana: http://localhost:3000" +# Try to open in a browser (Linux/macOS/Windows PowerShell) +xdg-open http://localhost:3000 2>/dev/null || open http://localhost:3000 2>/dev/null || start http://localhost:3000 2>/dev/null || true--- `3-8`: **Tighten wording and metric names (minor)** Optional polish for readability; no content change. ```diff -Open the CoreWeave Grafana instance and inspect the following dashboards while -running the vLLM demo: +Open the CoreWeave Grafana instance and, while running the vLLM demo, inspect:examples/vllm/run_vllm_tensorized.sh (2)
8-14: Add readiness wait, JSON header, and robust cleanupPrevents flakiness from fixed sleep, ensures correct content type, and cleans up the server on exit.
-vllm serve --model "$MODEL_URI" --tensorizer --port "$PORT" & -SERVER_PID=$! - -sleep 5 -curl -sS http://localhost:$PORT/generate -d '{"prompt":"Hello","max_tokens":8}' - -kill $SERVER_PID +vllm serve --model "$MODEL_URI" --tensorizer --port "$PORT" & +SERVER_PID=$! +cleanup() { kill "$SERVER_PID" 2>/dev/null || true; wait "$SERVER_PID" 2>/dev/null || true; } +trap cleanup EXIT + +# Wait up to ~30s for the port to accept connections +for _ in {1..30}; do + if timeout 1 bash -c ">/dev/tcp/127.0.0.1/$PORT" 2>/dev/null; then break; fi + sleep 1 +done + +curl -fsS "http://localhost:$PORT/generate" \ + -H 'Content-Type: application/json' \ + -d '{"prompt":"Hello","max_tokens":8}'
5-6: Validate prerequisites (optional)Fail fast if vllm CLI is missing or port is occupied.
MODEL_URI=${1:-s3://my-bucket/models/tiny-gpt2.tensors} PORT=${PORT:-8000} +command -v vllm >/dev/null || { echo "vllm CLI not found on PATH"; exit 1; }docs/overview.md (2)
3-5: Minor wording/hyphenationHyphenate “CoreWeave‑aligned”; no semantic change.
-This repository demonstrates a CoreWeave aligned stack for high performance model serving. +This repository demonstrates a CoreWeave‑aligned stack for high‑performance model serving.
14-16: Verify the Quickstart is offline‑friendlyThe prior test run hit a ProxyError. If the demo fetches remote artifacts, add a note about offline mode and credentials, or switch to a purely local artifact path.
docs/observability.md (1)
7-11: Make Grafana navigation unambiguous (optional)Name a dashboard explicitly to reduce guesswork (e.g., “Kubernetes / Workloads / Pods”).
README.md (2)
1-1: Deduplicate H1 title.Two top-level “# tensorizer” headers (Line 1 here and Line 22 below) read oddly. Keep one.
13-20: Quickstart links look good. Minor polish optional.Consider adding brief one-line descriptions after each link for scannability.
examples/tensorizer/serialize_and_load.py (2)
63-73: Small readiness race when starting the HTTP server.Add a short wait or probe to avoid connection refusals on fast clients.
Apply:
else: port = 8000 serve_file(out_path, port) + import time + time.sleep(0.2) # give server a moment to start uri = f"http://localhost:{port}/{os.path.basename(out_path)}"
24-29: S3 upload lacks basic error handling.Wrap boto3 call to surface actionable errors for missing creds/bucket.
Apply:
def upload_to_s3(path: str, bucket: str, key: str) -> None: import boto3 - - s3 = boto3.client("s3") - s3.upload_file(path, bucket, key) + from botocore.exceptions import BotoCoreError, ClientError + s3 = boto3.client("s3") + try: + s3.upload_file(path, bucket, key) + except (BotoCoreError, ClientError) as e: + raise RuntimeError(f"Failed to upload {path} to s3://{bucket}/{key}: {e}") from edocs/tensorizer.md (3)
19-20: Align num_readers with script default or mention the flag.Either show num_readers=4 (default) or note that you can set --num-readers 8.
Apply:
-3. `TensorDeserializer(..., device=..., lazy_load=True, num_readers=8)` streams +3. `TensorDeserializer(..., device=..., lazy_load=True, num_readers=4)` streams +# Use `--num-readers 8` to increase concurrency if your source supports range requests.
21-22: Fix product name capitalization.Knative (not KNative).
Apply:
-4. KNative/KServe benefit from faster cold starts because weights are fetched +4. Knative/KServe benefit from faster cold starts because weights are fetched
24-24: Optional: qualify the throughput claim.Consider adding “depending on model/dtype and storage backend” to set expectations.
helm/tensorizer-vllm/Chart.yaml (1)
1-5: Mark chart type for Helm v2 schema.Explicitly set type: application.
Apply:
apiVersion: v2 name: tensorizer-vllm version: 0.1.0 appVersion: "0.1.0" description: Deploy vLLM serving tensorized models +type: applicationhelm/tensorizer-vllm/templates/service.yaml (1)
3-10: Improve Service metadata and port naming.Add standard labels and a named port for clarity/prometheus scraping.
Apply:
metadata: name: {{ .Release.Name }} + labels: + app.kubernetes.io/name: {{ .Release.Name }} spec: selector: app: {{ .Release.Name }} ports: - - port: 80 + - name: http + port: 80 targetPort: 8000helm/tensorizer-vllm/templates/deployment.yaml (1)
5-14: Consider making resources configurable via values.Avoid hardcoding; let users set CPU/memory/GPU.
Example (template-side):
spec: replicas: 1 selector: matchLabels: app: {{ .Release.Name }} template: metadata: labels: app: {{ .Release.Name }} spec: containers: - name: vllm + resources: + {{- toYaml .Values.resources | nindent 12 }}And add in values.yaml:
resources: requests: { cpu: "1", memory: "4Gi" } limits: { cpu: "2", memory: "8Gi" } # limits: # nvidia.com/gpu: 1 # if GPU requireddocs/schedule-k8s-with-slurm.md (1)
16-19: Add language to fenced code block (markdownlint MD040).-``` +```text NAME READY STATUS RESTARTS AGE slurm-pod-0 1/1 Running 0 1m</blockquote></details> <details> <summary>examples/sunk/slurm-pod/pod.sbatch (1)</summary><blockquote> `1-5`: **Optional: wait for readiness and clean up.** Improves UX for demos. ```diff #SBATCH --output=slurm-pod.log -srun kubectl run slurm-pod-0 --image=busybox --restart=Never --command -- sh -c 'echo hello-world; sleep 30' +srun kubectl run slurm-pod-0 --image=busybox --restart=Never --labels=job-name=slurm-pod --command -- sh -c 'echo hello-world; sleep 30' +kubectl wait --for=condition=Ready pod/slurm-pod-0 --timeout=60s || true +# Optional: show logs, then clean up +kubectl logs slurm-pod-0 || true +kubectl delete pod slurm-pod-0 --ignore-not-foundexamples/sunk/slurm-pod/README.md (3)
13-17: Make the run flow deterministic and observable.Capture the job ID, wait for the Pod readiness by label, then fetch logs. This avoids races.
```bash -sbatch pod.sbatch -squeue -u $USER -kubectl get pods -l job-name=slurm-pod +JOBID=$(sbatch --parsable pod.sbatch) +squeue -j "$JOBID" +kubectl wait --for=condition=Ready pod -l job-name=slurm-pod --timeout=120s +kubectl logs slurm-pod-0--- `19-24`: **Tighten success criteria phrasing.** Use consistent casing and precise conditions. ```diff -- `squeue` shows the job in `RUNNING` -- `kubectl get pods` shows the pod in `Running` -- `kubectl logs slurm-pod-0` prints `hello-world` +- `squeue` shows the job as `RUNNING` +- `kubectl get pods` shows the Pod `Running` (or `Ready` after the wait) +- `kubectl logs slurm-pod-0` includes `hello-world`
25-29: Ensure full cleanup (Slurm job and spawned Pod).Canceling the job may leave the Pod. Offer a label-based delete to avoid leaks.
```bash -scancel <jobid> +scancel <jobid> +kubectl delete pod -l job-name=slurm-pod --ignore-not-found</blockquote></details> <details> <summary>gitops/argocd/app.yaml (2)</summary><blockquote> `6-9`: **Create target namespace automatically.** Default namespace is fine for demos, but creating a dedicated namespace improves isolation. ```diff destination: server: https://kubernetes.default.svc - namespace: default + namespace: vllmAnd enable CreateNamespace:
syncPolicy: automated: prune: true selfHeal: true + syncOptions: + - CreateNamespace=true
13-17: Add basic health and retry for stability.Self-heal is on; add retries to avoid transient failures bricking syncs.
project: default syncPolicy: automated: prune: true selfHeal: true + retry: + limit: 5 + backoff: + duration: 5s + factor: 2 + maxDuration: 3mdocs/cicd.md (3)
7-11: Clarify Quickstart sequencing and fix wrapping.Combine the wrapped sentence and explicitly mention registry push.
-2. The workflow builds the image with `docker build` and scans it with Trivy. -3. If successful the image is pushed to the registry and Helm is upgraded in a -test namespace. +2. The workflow builds the image with `docker build` and scans it with Trivy. +3. On success, it pushes the image to the registry and upgrades the Helm release in a test namespace.
14-18: Verify the workflow actually implements Cosign/Syft/Trivy gates.Docs mention OIDC auth, External Secrets, GitOps commits, Cosign, and SBOMs. Ensure
.github/workflows/build-and-deploy.ymlhas these steps and required permissions (e.g.,contents: writefor GitOps commits; OIDC to GHCR; Cosign keyless).I can align the workflow with the doc (Cosign keyless sign, Syft SBOM upload, Trivy PR gate) if you want a patch.
14-16: Name the exact secrets and permissions.Minimal additions help users succeed.
Add a short list, for example:
- Required permissions: id-token: write; contents: write; packages: write.
- Required secrets (if not using keyless): COSIGN_PRIVATE_KEY, COSIGN_PASSWORD (or use keyless).
- External Secrets references for registry creds if pushing outside GHCR.
docs/vllm.md (3)
8-12: Quickstart: call out credentials and port.S3 URIs require credentials and the server binds a port; add one-liners to reduce first‑run failures.
```bash -bash examples/vllm/run_vllm_tensorized.sh s3://my-bucket/models/tiny-gpt2.tensors +# Set credentials if using S3 +export AWS_REGION=us-east-1 +export AWS_ACCESS_KEY_ID=...; export AWS_SECRET_ACCESS_KEY=... +# Launch +bash examples/vllm/run_vllm_tensorized.sh s3://my-bucket/models/tiny-gpt2.tensors +# Default API is on port 8000 +curl -s localhost:8000/v1/models || true--- `16-23`: **Confirm flag support and provide tuning pointers.** `--tensorizer` and env names can drift across versions. Ask users to match their vLLM version and add a basic throughput knob. ```diff -1. `vllm serve --tensorizer` reads weights from disk, HTTP, or S3. +1. `vllm serve --tensorizer` reads weights from disk, HTTP, or S3 (verify your vLLM version supports this flag). 2. Environment variables like `VLLM_WORKER_GPU_MEMORY_UTILIZATION` tune throughput vs. memory usage. + - Also consider `VLLM_MAX_MODEL_LEN` and `VLLM_CPU_OFFLOAD_GB` per GPU memory. 3. Prometheus metrics at `/metrics` expose time‑to‑first‑token and tokens/sec.
20-22: Link chart usage to values that matter for vLLM.Mention key values so users don’t hunt.
Add a sentence after the Helm reference:
- Important values:
image,modelURI,resources(GPU),replicas, andenv(VLLM_*).docs/sunk.md (4)
3-5: Grammar and brevity.Remove double spaces and tighten phrasing.
-SUNK runs Slurm control and worker nodes inside Kubernetes Pods. Slurm and -native Kubernetes workloads can share the same cluster while maintaining -isolation. +SUNK runs Slurm control and worker nodes inside Kubernetes Pods. Slurm and native Kubernetes workloads share the same cluster while maintaining isolation.
22-29: Install into a dedicated namespace and wait for readiness.Improves reproducibility and avoids racing job submission.
```bash -# install SUNK operator -helm repo add sunk https://coreweave.github.io/sunk -helm install sunk sunk/sunk-operator +# install SUNK operator +helm repo add sunk https://coreweave.github.io/sunk +helm install sunk sunk/sunk-operator -n sunk-system --create-namespace +kubectl -n sunk-system rollout status deploy/sunk-operator --timeout=120s--- `33-36`: **Clarify K8s plugin requirement.** Make it explicit the Slurm Kubernetes plugin is needed to launch native Pods. ```diff -3. Slurm's Kubernetes plugin can launch native Pods alongside batch jobs. +3. With the Slurm Kubernetes plugin enabled, Slurm can launch native Pods alongside batch jobs.
38-39: Link text polish.Add a comma for flow and simplify the link target (folder is enough).
-For a demo of creating a Kubernetes Pod from Slurm see -[examples/sunk/slurm-pod](../examples/sunk/slurm-pod/README.md). +For a demo of creating a Kubernetes Pod from Slurm, see +[examples/sunk/slurm-pod](../examples/sunk/slurm-pod/).security/policy-notes.md (1)
35-38: Image-signing: add concrete commands and admission policy example.Make the guidance actionable with signing/attestation and policy verification.
Append:
# Build SBOM and attach syft packages --source app-image:latest -o spdx-json > sbom.spdx.json cosign attach sbom --sbom sbom.spdx.json app-image:latest # Keyless sign (OIDC) COSIGN_YES=true cosign sign app-image:latest # Kyverno verifyImages (keyless) – example apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: verify-signed-images spec: rules: - name: require-cosign match: any: - resources: kinds: ["Pod"] verifyImages: - imageReferences: ["ghcr.io/*/*"] attestations: - type: spdx keyless: issuer: "https://token.actions.githubusercontent.com" subject: "repo:${{ org }}/*"Nit: remove double spaces after periods for consistency (“Cosign). Generate SBOMs…”).
.github/workflows/build-and-deploy.yml (1)
11-27: Harden workflow: pin actions, cache builds, and fail on high vulns.Supply-chain and performance improvements.
- Pin actions to commit SHAs.
- Use docker/build-push-action with GHA cache (type=gha).
- Configure Trivy to fail on HIGH/CRITICAL and ignore-unfixed as needed:
- name: Scan uses: aquasecurity/trivy-action@0.20.0 with: - image-ref: ${{ env.IMAGE }} + image-ref: ${{ env.IMAGE }} + vuln-type: 'os,library' + severity: 'HIGH,CRITICAL' + exit-code: '1' + ignore-unfixed: truedocs/cks.md (3)
8-11: Quickstart: call out default-deny egress and binding to a dedicated SA.Prevents accidental wide egress and avoids using default service account.
Proposed bullets:
- Apply a namespace-level default-deny egress policy; then allow only required egress (DNS, object storage).
- Bind workloads to a dedicated ServiceAccount via Role/RoleBinding (avoid default SA).
14-18: NetworkPolicy and admission notes: add DNS allowance and clarify tooling.Mention explicit DNS egress and name an admission tool to reduce ambiguity.
- “NetworkPolicies restrict pod egress…” → add “and permit DNS to kube-system on TCP/UDP 53.”
- “Admission policies (OPA/Gatekeeper or Kyverno)…” → include a minimal example link/snippet as in security/policy-notes.md.
19-20: Link text vs href mismatch.Text says “docs/cicd.md” but href is “cicd.md”; align for consistency.
-Security is continuous; integrate checks into CI/CD as shown in -[docs/cicd.md](cicd.md). +Security is continuous; integrate checks into CI/CD as shown in +[cicd.md](cicd.md).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
docs/img/architecture.svgis excluded by!**/*.svg
📒 Files selected for processing (23)
.github/workflows/build-and-deploy.yml(1 hunks)README.md(1 hunks)docs/cicd.md(1 hunks)docs/cks.md(1 hunks)docs/observability.md(1 hunks)docs/overview.md(1 hunks)docs/schedule-k8s-with-slurm.md(1 hunks)docs/sunk.md(1 hunks)docs/tensorizer.md(1 hunks)docs/vllm.md(1 hunks)examples/observability/grafana/README.md(1 hunks)examples/sunk/slurm-pod/README.md(1 hunks)examples/sunk/slurm-pod/pod.sbatch(1 hunks)examples/tensorizer/serialize_and_load.py(1 hunks)examples/vllm/run_vllm_tensorized.sh(1 hunks)gitops/argocd/app.yaml(1 hunks)helm/tensorizer-vllm/Chart.yaml(1 hunks)helm/tensorizer-vllm/templates/deployment.yaml(1 hunks)helm/tensorizer-vllm/templates/ingress.yaml(1 hunks)helm/tensorizer-vllm/templates/service.yaml(1 hunks)helm/tensorizer-vllm/values.yaml(1 hunks)k8s/knative-service.yaml(1 hunks)security/policy-notes.md(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/tensorizer/serialize_and_load.py (1)
tensorizer/serialization.py (5)
TensorDeserializer(1591-3419)TensorSerializer(3422-4815)write_module(4579-4674)key(1497-1505)load_into_module(3222-3307)
🪛 LanguageTool
examples/observability/grafana/README.md
[grammar] ~3-~3: There might be a mistake here.
Context: ...d inspect the following dashboards while running the vLLM demo: - **Kubernetes /...
(QB_NEW_EN)
[grammar] ~6-~6: There might be a mistake here.
Context: ...mpute / GPU** – DCGM_FI_DEV_GPU_UTIL, DCGM_FI_DEV_FB_USED - Kubernetes / Networking / Namespace – ...
(QB_NEW_EN)
[grammar] ~7-~7: There might be a mistake here.
Context: ...Kubernetes / Networking / Namespace** – container_network_receive_bytes_total - Loki Logs – query app=vllm For a lo...
(QB_NEW_EN)
security/policy-notes.md
[grammar] ~35-~35: There might be a mistake here.
Context: ...//github.com/sigstore/cosign). Generate SBOMs with [Syft](https://github.com/anc...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ...chore/syft) and store them alongside the images. Admission controllers should ve...
(QB_NEW_EN)
[grammar] ~37-~37: There might be a mistake here.
Context: ...erify signatures before allowing pods to run.
(QB_NEW_EN)
[grammar] ~38-~38: There might be a mistake here.
Context: ... signatures before allowing pods to run.
(QB_NEW_EN)
docs/tensorizer.md
[grammar] ~3-~3: There might be a mistake here.
Context: ...nto a single .tensors file that can be streamed from HTTP or S3 at wire speed. ...
(QB_NEW_EN)
[grammar] ~12-~12: There might be a mistake here.
Context: ...l, serves it over HTTP, and lazily loads it back into a fresh module. ## Fifteen...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ... lazy_load=True, num_readers=8)` streams the model directly to CPU or GPU memory....
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...the model directly to CPU or GPU memory. 4. KNative/KServe benefit from faster cold ...
(QB_NEW_EN)
[grammar] ~21-~21: There might be a mistake here.
Context: ... cold starts because weights are fetched on demand rather than baked into the con...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: ... network limits: on 40GbE expect ~5GB/s.
(QB_NEW_EN)
docs/overview.md
[grammar] ~3-~3: There might be a mistake here.
Context: ...tack for high performance model serving. It combines **Slurm on Kubernetes (SUNK)...
(QB_NEW_EN)
[grammar] ~4-~4: There might be a mistake here.
Context: ... Tensorizer, vLLM, and CoreWeave observability to provide fast, reproduci...
(QB_NEW_EN)
docs/vllm.md
[grammar] ~3-~3: There might be a mistake here.
Context: ...roject/vllm) can load tensorized weights without conversion. ## Five‑Minute Quic...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: ...rreads weights from disk, HTTP, or S3. 2. Environment variables likeVLLM_WORKER_...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...pose time‑to‑first‑token and tokens/sec. 4. Scale out with KServe or plain Deploymen...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...lain Deployments using the Helm chart in [helm/tensorizer-vllm](../helm/tensori...
(QB_NEW_EN)
docs/sunk.md
[grammar] ~3-~3: There might be a mistake here.
Context: ...nodes inside Kubernetes Pods. Slurm and native Kubernetes workloads can share th...
(QB_NEW_EN)
[grammar] ~4-~4: There might be a mistake here.
Context: ...share the same cluster while maintaining isolation. ``` +-----------------------...
(QB_NEW_EN)
[grammar] ~35-~35: There might be a mistake here.
Context: ...launch native Pods alongside batch jobs. 4. Metrics and logs are exported to CoreWea...
(QB_NEW_EN)
[grammar] ~38-~38: There might be a mistake here.
Context: ...creating a Kubernetes Pod from Slurm see [examples/sunk/slurm-pod](../examples/su...
(QB_NEW_EN)
[grammar] ~39-~39: There might be a mistake here.
Context: ...](../examples/sunk/slurm-pod/README.md).
(QB_NEW_EN)
docs/cicd.md
[grammar] ~9-~9: There might be a mistake here.
Context: ...o the registry and Helm is upgraded in a test namespace. ## Fifteen‑Minute Deep ...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ...istry using OIDC and short‑lived tokens. 2. Secrets are provided via External Secret...
(QB_NEW_EN)
docs/observability.md
[grammar] ~11-~11: There might be a mistake here.
Context: ...twork throughput, and pod restarts while invoking the model. ## Fifteen‑Minute D...
(QB_NEW_EN)
examples/sunk/slurm-pod/README.md
[grammar] ~3-~3: There might be a mistake here.
Context: ...n create a native Kubernetes Pod via the kubernetes plugin. ## Prerequisites ...
(QB_NEW_EN)
README.md
[grammar] ~13-~13: There might be a mistake here.
Context: ...ure.svg) ## Quickstart Map - Overview - SUNK - [Schedule K8s Pods...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ... - Overview - SUNK - [Schedule K8s Pods with Slurm](docs/sched...
(QB_NEW_EN)
[grammar] ~15-~15: There might be a mistake here.
Context: ...sunk.md) - Schedule K8s Pods with Slurm - Tensorizer - [vLLM]...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: ...chedule-k8s-with-slurm.md) - Tensorizer - vLLM - [Observability](do...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...Tensorizer - vLLM - Observability - ...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ... - vLLM - Observability - CI/CD - [Security](docs/c...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...ability](docs/observability.md) - CI/CD - Security # tensorizer Mod...
(QB_NEW_EN)
docs/cks.md
[grammar] ~3-~3: There might be a mistake here.
Context: ...etes Certified Security Specialist (CKS) objectives. ## Five‑Minute Quickstart ...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ...notes.md` for Role/RoleBinding examples. - NetworkPolicies restrict pod egress to O...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/schedule-k8s-with-slurm.md
16-16: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 actionlint (1.7.7)
.github/workflows/build-and-deploy.yml
13-13: the runner of "docker/setup-buildx-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 Ruff (0.12.2)
examples/tensorizer/serialize_and_load.py
34-34: Possible binding to all interfaces
(S104)
🔇 Additional comments (1)
README.md (1)
9-9: Verify architecture image path.Ensure docs/img/architecture.svg exists (case-sensitive on some filesystems) and renders on GitHub.
| name: build-and-deploy | ||
|
|
||
| on: | ||
| push: | ||
| branches: [ main ] | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add required permissions for GHCR push (packages: write).
Without this, push to ghcr.io can fail.
name: build-and-deploy
on:
push:
branches: [ main ]
+permissions:
+ contents: read
+ packages: write
+ id-token: write📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| name: build-and-deploy | |
| on: | |
| push: | |
| branches: [ main ] | |
| name: build-and-deploy | |
| on: | |
| push: | |
| branches: [ main ] | |
| permissions: | |
| contents: read | |
| packages: write | |
| id-token: write |
🤖 Prompt for AI Agents
.github/workflows/build-and-deploy.yml lines 1-6: the workflow lacks required
permissions for pushing to GHCR, causing push failures; add a top-level
permissions block (under the workflow root, before jobs) and set packages: write
(and optionally contents: read) so the runner can push images to ghcr.io; ensure
indentation matches YAML (permissions: packages: write).
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - name: Set up Docker | ||
| uses: docker/setup-buildx-action@v2 |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Update setup-buildx-action to v3 (actionlint failure).
v2 is flagged incompatible on current runners.
- - name: Set up Docker
- uses: docker/setup-buildx-action@v2
+ - name: Set up Docker
+ uses: docker/setup-buildx-action@v3📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uses: docker/setup-buildx-action@v2 | |
| - name: Set up Docker | |
| uses: docker/setup-buildx-action@v3 |
🧰 Tools
🪛 actionlint (1.7.7)
13-13: the runner of "docker/setup-buildx-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
.github/workflows/build-and-deploy.yml around line 13: the workflow uses
docker/setup-buildx-action@v2 which is flagged incompatible on current runners;
update the action reference to docker/setup-buildx-action@v3 (or the latest v3
tag) in the uses field, run a quick syntax/lint check and re-run the workflow to
ensure compatibility with the current runners.
| run: docker build -t example/vllm:${{ github.sha }} . | ||
| - name: Scan | ||
| uses: aquasecurity/trivy-action@0.20.0 | ||
| with: | ||
| image-ref: example/vllm:${{ github.sha }} | ||
| - name: Login | ||
| uses: docker/login-action@v3 | ||
| with: | ||
| registry: ghcr.io | ||
| username: ${{ github.actor }} | ||
| password: ${{ secrets.GITHUB_TOKEN }} | ||
| - name: Push | ||
| run: docker push example/vllm:${{ github.sha }} | ||
| - name: Helm Upgrade | ||
| run: helm upgrade --install tensorizer helm/tensorizer-vllm --set image=example/vllm:${{ github.sha }} --namespace test --create-namespace |
There was a problem hiding this comment.
Tag to GHCR consistently and reuse a single IMAGE var across steps.
Current tag example/vllm doesn’t match GHCR login and will push to Docker Hub.
+ - name: Set image tag
+ run: echo "IMAGE=ghcr.io/${{ github.repository_owner }}/tensorizer-vllm:${{ github.sha }}" >> $GITHUB_ENV
- name: Build
- run: docker build -t example/vllm:${{ github.sha }} .
+ run: docker build -t "$IMAGE" .
- name: Scan
uses: aquasecurity/trivy-action@0.20.0
with:
- image-ref: example/vllm:${{ github.sha }}
+ image-ref: ${{ env.IMAGE }}
- name: Login
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Push
- run: docker push example/vllm:${{ github.sha }}
+ run: docker push "$IMAGE"
- name: Helm Upgrade
- run: helm upgrade --install tensorizer helm/tensorizer-vllm --set image=example/vllm:${{ github.sha }} --namespace test --create-namespace
+ run: helm upgrade --install tensorizer helm/tensorizer-vllm --set image=${IMAGE} --namespace test --create-namespace🤖 Prompt for AI Agents
.github/workflows/build-and-deploy.yml lines 15-29: the workflow builds, scans,
logs in and pushes an image but uses inconsistent name "example/vllm" which
targets Docker Hub while the login is for GHCR; change to define a single IMAGE
variable (e.g. IMAGE=ghcr.io/${{ github.repository_owner }}/vllm:${{ github.sha
}}) at job or step level and reference that variable in all steps (docker build
-t $IMAGE ., trivy-action image-ref: $IMAGE, docker push $IMAGE, and helm
upgrade --set image=$IMAGE ...) so the same GHCR-qualified tag is used
everywhere and removes the mismatch.
| - name: Push | ||
| run: docker push example/vllm:${{ github.sha }} | ||
| - name: Helm Upgrade | ||
| run: helm upgrade --install tensorizer helm/tensorizer-vllm --set image=example/vllm:${{ github.sha }} --namespace test --create-namespace |
There was a problem hiding this comment.
Helm step lacks Helm installation and kubeconfig; deployment will fail.
Install Helm and configure cluster auth (e.g., from a base64 KUBECONFIG secret) before running upgrade.
- name: Push
run: docker push "$IMAGE"
+ - name: Install Helm
+ uses: azure/setup-helm@v4
+ - name: Configure kubeconfig
+ if: ${{ secrets.KUBECONFIG_B64 != '' }}
+ run: |
+ echo "${KUBECONFIG_B64}" | base64 -d > $HOME/kubeconfig
+ echo "KUBECONFIG=$HOME/kubeconfig" >> $GITHUB_ENV
+ env:
+ KUBECONFIG_B64: ${{ secrets.KUBECONFIG_B64 }}
- name: Helm Upgrade
- run: helm upgrade --install tensorizer helm/tensorizer-vllm --set image=${IMAGE} --namespace test --create-namespace
+ run: helm upgrade --install tensorizer helm/tensorizer-vllm --set image=${IMAGE} --namespace test --create-namespaceOptionally gate with environments and approvals (environment: test) and add concurrency to avoid overlapping deploys.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In .github/workflows/build-and-deploy.yml around line 29, the Helm deploy step
runs helm upgrade without ensuring Helm is installed or a kubeconfig is
configured; before the helm upgrade, add steps to install Helm (or use
actions/helm) and set up cluster auth by decoding a base64 KUBECONFIG secret
into $HOME/.kube/config (or use
azure/k8s-set-context/google-github-actions/get-gke-credentials as appropriate),
then run helm init/registry login if needed and only after that run helm upgrade
--install; additionally, gate this job with an environment: test and required
reviewers/approvals and add concurrency/group to the workflow to prevent
overlapping deploys.
|
|
||
| 1. The `kube-state-metrics` and `node-exporter` dashboards show cluster health. | ||
| 2. vLLM exports Prometheus metrics such as `vllm_engine_execution_time`. | ||
| 3. Logs are collected via Loki; search by `app=vllm`. |
There was a problem hiding this comment.
Correct Loki query syntax
Use {app="vllm"} rather than app=vllm.
-3. Logs are collected via Loki; search by `app=vllm`.
+3. Logs are collected via Loki; search by `{app="vllm"}`.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 3. Logs are collected via Loki; search by `app=vllm`. | |
| 3. Logs are collected via Loki; search by `{app="vllm"}`. |
🤖 Prompt for AI Agents
In docs/observability.md around line 18, the Loki query example uses incorrect
syntax "app=vllm"; replace it with the proper label matcher form {app="vllm"} so
examples and instructions use valid Loki query syntax; update the text to show
the corrected query and ensure any surrounding examples use the same
label-matcher format.
| autoscaling.knative.dev/minScale: "0" | ||
| spec: | ||
| containers: | ||
| - image: vllm/vllm:latest |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Pin the image (avoid latest) and align with CI/CD output
Use a tagged release or digest, or the CI-produced image (e.g., example/vllm:${{ github.sha }}) for reproducibility and supply-chain integrity.
- - image: vllm/vllm:latest
+ - image: example/vllm:{{ .Values.image.tag | default "sha-<commit>" }}If not using Helm templating here, pin to a specific version or digest.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In k8s/knative-service.yaml around line 12, the image is using the mutable tag
"vllm/vllm:latest"; replace it with a pinned image reference for reproducibility
and supply-chain integrity—either the CI-built image variable (e.g.,
example/vllm:${{ github.sha }}) or a fixed semver tag or immutable digest
(sha256:...), and ensure the manifest uses the same tag/digest the pipeline
publishes (or add templating if needed) so deployments always pull the exact
intended image.
| spec: | ||
| containers: | ||
| - image: vllm/vllm:latest | ||
| args: ["serve", "--model", "s3://my-bucket/models/tiny-gpt2.tensors", "--tensorizer"] |
There was a problem hiding this comment.
🛠️ Refactor suggestion
S3 credentials/config not provided
Loading from s3:// will fail without credentials. Mount a Secret or set env vars (and prefer an S3-compatible endpoint if using CoreWeave Object Storage).
args: ["serve", "--model", "s3://my-bucket/models/tiny-gpt2.tensors", "--tensorizer"]
+ env:
+ - name: AWS_ACCESS_KEY_ID
+ valueFrom:
+ secretKeyRef: { name: s3-credentials, key: accessKeyId }
+ - name: AWS_SECRET_ACCESS_KEY
+ valueFrom:
+ secretKeyRef: { name: s3-credentials, key: secretAccessKey }
+ - name: AWS_REGION
+ value: us-east-1
+ # Uncomment if using a custom S3 endpoint:
+ # - name: AWS_ENDPOINT_URL_S3
+ # value: https://<endpoint>I can draft a Secret manifest if you share the naming you prefer.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| args: ["serve", "--model", "s3://my-bucket/models/tiny-gpt2.tensors", "--tensorizer"] | |
| args: ["serve", "--model", "s3://my-bucket/models/tiny-gpt2.tensors", "--tensorizer"] | |
| env: | |
| - name: AWS_ACCESS_KEY_ID | |
| valueFrom: | |
| secretKeyRef: | |
| name: s3-credentials | |
| key: accessKeyId | |
| - name: AWS_SECRET_ACCESS_KEY | |
| valueFrom: | |
| secretKeyRef: | |
| name: s3-credentials | |
| key: secretAccessKey | |
| - name: AWS_REGION | |
| value: us-east-1 | |
| # Uncomment if using a custom S3 endpoint: | |
| # - name: AWS_ENDPOINT_URL_S3 | |
| # value: https://<endpoint> |
🤖 Prompt for AI Agents
In k8s/knative-service.yaml around line 13, the container args reference an
s3:// model path but no S3 credentials or endpoint are provided; add a
Kubernetes Secret (or use an existing one) with AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY (optionally AWS_REGION and S3_ENDPOINT for S3-compatible
providers like CoreWeave), then inject those into the Knative Service container
as environment variables (envFrom: secretRef or individual env entries) or mount
the Secret as files and point the runtime to them; also consider setting
S3_ENDPOINT to the CoreWeave Object Storage endpoint and prefer using the Secret
name you want so I can draft the Secret manifest for you.
| [](https://github.com/coreweave/tensorizer/actions/workflows/build-and-deploy.yml) | ||
| [](#) | ||
| [](#) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Fix workflow badge to this repo (not coreweave/tensorizer).
Current badge URLs point to coreweave/tensorizer, so status is misleading in this fork/PR.
Apply:
-[](https://github.com/coreweave/tensorizer/actions/workflows/build-and-deploy.yml)
+[](https://github.com/TylrDn/tensorizer/actions/workflows/build-and-deploy.yml)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| [](https://github.com/coreweave/tensorizer/actions/workflows/build-and-deploy.yml) | |
| [](#) | |
| [](#) | |
| [](https://github.com/TylrDn/tensorizer/actions/workflows/build-and-deploy.yml) | |
| [](#) | |
| [](#) |
🤖 Prompt for AI Agents
In README.md around lines 3 to 5, the GitHub Actions workflow badge links to
coreweave/tensorizer which is incorrect for this fork; update the badge URL so
it points to this repository's workflows (replace coreweave/tensorizer in the
badge link and image URL with the current GitHub owner/repo or use the
repository-relative workflow path), leaving the rest of the markdown intact;
verify the badge link and image render the correct status for this repo.
| ```yaml | ||
| apiVersion: rbac.authorization.k8s.io/v1 | ||
| kind: Role | ||
| metadata: | ||
| name: pod-runner | ||
| rules: | ||
| - apiGroups: [""] | ||
| resources: ["pods"] | ||
| verbs: ["create", "get", "list"] | ||
| ``` |
There was a problem hiding this comment.
🛠️ Refactor suggestion
RBAC example: scope and binding are missing; add namespace and RoleBinding.
Roles are namespace-scoped; without a RoleBinding the example isn’t actionable.
Apply:
kind: Role
metadata:
name: pod-runner
+ namespace: workloads
rules:
- apiGroups: [""]
resources: ["pods"]
- verbs: ["create", "get", "list"]
+ verbs: ["create", "get", "list"]Add this RoleBinding right below to bind a least-privileged SA:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pod-runner
namespace: workloads
subjects:
- kind: ServiceAccount
name: runner
namespace: workloads
roleRef:
kind: Role
name: pod-runner
apiGroup: rbac.authorization.k8s.io🤖 Prompt for AI Agents
security/policy-notes.md around lines 5 to 14: the RBAC Role example is missing
namespace scope and a RoleBinding so it’s not actionable; add
metadata.namespace: workloads to the Role and immediately add a RoleBinding
resource that binds the least-privileged ServiceAccount named "runner" in the
workloads namespace to the pod-runner Role (roleRef.kind: Role, roleRef.name:
pod-runner, apiGroup: rbac.authorization.k8s.io) and set the
RoleBinding.subjects with kind: ServiceAccount, name: runner, namespace:
workloads.
| ```yaml | ||
| apiVersion: networking.k8s.io/v1 | ||
| kind: NetworkPolicy | ||
| metadata: | ||
| name: restrict-egress | ||
| spec: | ||
| podSelector: {} | ||
| policyTypes: [Egress] | ||
| egress: | ||
| - to: | ||
| - namespaceSelector: | ||
| matchLabels: | ||
| access: object-storage | ||
| ``` |
There was a problem hiding this comment.
🛠️ Refactor suggestion
NetworkPolicy will break DNS; explicitly allow DNS egress.
As written, all egress except object storage is denied; pods will fail DNS lookups.
Apply:
spec:
podSelector: {}
policyTypes: [Egress]
egress:
- to:
- namespaceSelector:
matchLabels:
access: object-storage
+ - to:
+ - namespaceSelector:
+ matchLabels:
+ kubernetes.io/metadata.name: kube-system
+ ports:
+ - protocol: UDP
+ port: 53
+ - protocol: TCP
+ port: 53Optional: restrict by IPBlock for your DNS service if labels differ.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ```yaml | |
| apiVersion: networking.k8s.io/v1 | |
| kind: NetworkPolicy | |
| metadata: | |
| name: restrict-egress | |
| spec: | |
| podSelector: {} | |
| policyTypes: [Egress] | |
| egress: | |
| - to: | |
| - namespaceSelector: | |
| matchLabels: | |
| access: object-storage | |
| ``` | |
| apiVersion: networking.k8s.io/v1 | |
| kind: NetworkPolicy | |
| metadata: | |
| name: restrict-egress | |
| spec: | |
| podSelector: {} | |
| policyTypes: [Egress] | |
| egress: | |
| - to: | |
| - namespaceSelector: | |
| matchLabels: | |
| access: object-storage | |
| - to: | |
| - namespaceSelector: | |
| matchLabels: | |
| kubernetes.io/metadata.name: kube-system | |
| ports: | |
| - protocol: UDP | |
| port: 53 | |
| - protocol: TCP | |
| port: 53 |
🤖 Prompt for AI Agents
In security/policy-notes.md around lines 18 to 31, the NetworkPolicy as shown
denies all egress except to namespaces labeled access=object-storage which will
break DNS resolution; update the policy to explicitly allow egress to your DNS
servers by adding an egress rule for DNS (UDP and TCP port 53) either by
allowing traffic to the kube-dns/coredns namespaceSelector (matchLabels:
k8s-app: kube-dns or your cluster label) or by adding an ipBlock/CIDR for your
DNS service, ensuring both UDP and TCP port 53 are permitted before the
restrictive object-storage egress rule.
Summary
Testing
python -m black examples/tensorizer/serialize_and_load.pyisort examples/tensorizer/serialize_and_load.pypip install -r requirements.txtpip install -r tests/requirements.txtpython -m unittest discover tests -v(fails: Killed)PYTHONPATH=. python examples/tensorizer/serialize_and_load.py --local-only(fails: ProxyError)linkchecker docs/overview.md(command not found)https://chatgpt.com/codex/tasks/task_e_68b224c0d1d08323b92fc953c577dc16
Summary by Sourcery
Add comprehensive CoreWeave full-stack examples with documentation, deployment configs, and CI/CD workflows to showcase SUNK, Tensorizer, vLLM, and observability.
New Features:
Enhancements:
CI:
Deployment:
Documentation:
Chores:
Summary by CodeRabbit
New Features
Documentation
Examples
Chores