Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 21 additions & 5 deletions .github/workflows/build-and-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,38 @@ jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@b4ffde82f28162d9a60b3dfb39e4f2447a3b1c7f
- name: Set up Docker
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@f3821f4794d9a373d160d5e86f922b622d21b008
- name: Set up kubectl
uses: azure/setup-kubectl@bd24c49a951a6eb2de75e8b4785905a9a05fd85e
with:
version: v1.28.3
- name: Set up Helm
uses: azure/setup-helm@e9a68a7554547a8bb11e9a56ce3cfd31c8eaa974
with:
version: v3.12.3
- name: Configure kubeconfig
run: |
mkdir -p ~/.kube
echo "${{ secrets.KUBECONFIG_DATA }}" | base64 -d > ~/.kube/config
- name: Build
run: docker build -t example/vllm:${{ github.sha }} .
- name: Scan
uses: aquasecurity/trivy-action@0.20.0
uses: aquasecurity/trivy-action@4d1a13b66e041b35769128a8b06845050806e06e
with:
image-ref: example/vllm:${{ github.sha }}
- name: Login
uses: docker/login-action@v3
uses: docker/login-action@ee0af82ac35b689a7dce1aa3e9e4b0943aa53d25
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Push
run: docker push example/vllm:${{ github.sha }}
- name: Helm Upgrade
run: helm upgrade --install tensorizer helm/tensorizer-vllm --set image=example/vllm:${{ github.sha }} --namespace test --create-namespace
run: |
helm upgrade --install tensorizer helm/tensorizer-vllm \
--set image=example/vllm:${{ github.sha }} \
--namespace test \
--create-namespace
5 changes: 3 additions & 2 deletions docs/observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@ invoking the model.

1. The `kube-state-metrics` and `node-exporter` dashboards show cluster health.
2. vLLM exports Prometheus metrics such as `vllm_engine_execution_time`.
3. Logs are collected via Loki; search by `app=vllm`.
4. For a local demo use the [`observability/` example](../examples/observability/grafana/README.md)
3. Ensure Prometheus scrapes the vLLM service on port `8000` to populate Grafana.
4. Logs are collected via Loki; search by `app=vllm`.
5. For a local demo use the [`observability/` example](../examples/observability/grafana/README.md)
which spins up Prometheus and Grafana with Docker Compose.

Screenshots can be added to `docs/img/` for presentations.
5 changes: 3 additions & 2 deletions examples/tensorizer/serialize_and_load.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import argparse
import os
import threading
from functools import partial
from http.server import SimpleHTTPRequestHandler, ThreadingHTTPServer

import torch
Expand All @@ -30,8 +31,8 @@ def upload_to_s3(path: str, bucket: str, key: str) -> None:

def serve_file(path: str, port: int) -> threading.Thread:
directory = os.path.dirname(os.path.abspath(path))
os.chdir(directory)
server = ThreadingHTTPServer(("0.0.0.0", port), SimpleHTTPRequestHandler)
handler = partial(SimpleHTTPRequestHandler, directory=directory)
server = ThreadingHTTPServer(("0.0.0.0", port), handler)
thread = threading.Thread(target=server.serve_forever, daemon=True)
thread.start()
return thread
Expand Down
2 changes: 1 addition & 1 deletion gitops/argocd/app.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ spec:
source:
repoURL: https://github.com/coreweave/tensorizer
path: helm/tensorizer-vllm
targetRevision: HEAD
targetRevision: main
project: default
syncPolicy:
automated:
Expand Down
13 changes: 13 additions & 0 deletions helm/tensorizer-vllm/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,16 @@ spec:
args: ["serve", "--model", "{{ .Values.modelURI }}", "--tensorizer"]
ports:
- containerPort: 8000
{{- if .Values.s3.secretName }}
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: {{ .Values.s3.secretName }}
key: accessKeyId
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: {{ .Values.s3.secretName }}
key: secretAccessKey
{{- end }}
4 changes: 3 additions & 1 deletion helm/tensorizer-vllm/values.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
image: "vllm/vllm:latest"
image: "vllm/vllm:0.2.2"
modelURI: "s3://my-bucket/models/tiny-gpt2.tensors"
host: "vllm.example.com"
s3:
secretName: ""
13 changes: 12 additions & 1 deletion k8s/knative-service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,18 @@ spec:
autoscaling.knative.dev/minScale: "0"
spec:
containers:
- image: vllm/vllm:latest
- image: vllm/vllm:0.2.2
args: ["serve", "--model", "s3://my-bucket/models/tiny-gpt2.tensors", "--tensorizer"]
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: s3-credentials
key: accessKeyId
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: s3-credentials
key: secretAccessKey
ports:
- containerPort: 8000
23 changes: 23 additions & 0 deletions k8s/networkpolicy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tensorizer-allow
spec:
podSelector:
matchLabels:
app: tensorizer
policyTypes:
- Ingress
- Egress
ingress:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 suggestion (security): Ingress policy allows traffic from any namespace, which may be overly permissive.

Restrict ingress to trusted namespaces to minimize exposure and enhance security.

Suggested implementation:

    - from:
        - namespaceSelector:
            matchLabels:
              team: trusted
      ports:
        - protocol: TCP
          port: 8000

You will need to ensure that the trusted namespaces in your cluster are labeled with team: trusted for this policy to work as intended. Adjust the label key/value as needed to match your organization's labeling conventions.

- from:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 8000
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
25 changes: 25 additions & 0 deletions k8s/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: tensorizer-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: tensorizer-role
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: tensorizer-rolebinding
subjects:
- kind: ServiceAccount
name: tensorizer-sa
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: tensorizer-role
Loading