This guide will walk you through deploying a simple disaggregated AI workload using Grove in about 10 minutes.
By the end of this quickstart, you'll understand how to:
- Deploy a multi-component workload with Grove
- Scale components independently or as a group
- Observe gang scheduling in action
- A Kubernetes cluster (we'll use kind for local testing)
kubectlinstalled and configured- Docker Desktop running (for kind)
We'll deploy a simple workload that mimics a disaggregated inference setup with four components:
- Role A (pca): Auto-scaling component (e.g., routing layer) - scales based on CPU
- Role B (pcb) and Role C (pcc): Tightly-coupled components (e.g., prefill + decode) - must scale together as a group
- Role D (pcd): Fixed-size component (e.g., model cache) - doesn't auto-scale
This demonstrates Grove's key capabilities: individual auto-scaling, grouped scaling, and gang scheduling.
Follow the installation guide to:
- Create a kind cluster with
make kind-up - Export the KUBECONFIG
- Deploy Grove with
make deploy
Quick check: Verify Grove operator is running:
kubectl get pods -l app.kubernetes.io/name=grove-operatorYou should see a pod in Running status.
Create a PodCliqueSet that defines all four components:
cd operator
kubectl apply -f samples/simple/simple1.yamlWhat just happened? Grove created:
- 1
PodCliqueSet(the top-level resource) - 4
PodCliques(one for each component role) - 1
PodCliqueScalingGroup(grouping pcb and pcc) - 1
PodGang(for gang scheduling) - 9
Pods(3 for pca, 2 each for pcb/pcc/pcd)
Watch as Grove creates and schedules all resources:
kubectl get pcs,pclq,pcsg,pg,pod -o wideExpected output:
NAME AGE
podcliqueset.grove.io/simple1 34s
NAME AGE
podclique.grove.io/simple1-0-pca 33s
podclique.grove.io/simple1-0-pcd 33s
podclique.grove.io/simple1-0-sga-0-pcb 33s
podclique.grove.io/simple1-0-sga-0-pcc 33s
NAME AGE
podcliquescalinggroup.grove.io/simple1-0-sga 33s
NAME AGE
podgang.scheduler.grove.io/simple1-0 33s
NAME READY STATUS RESTARTS AGE
pod/grove-operator-699c77979f-7x2zc 1/1 Running 0 51s
pod/simple1-0-pca-pkl2b 1/1 Running 0 33s
pod/simple1-0-pca-s7dz2 1/1 Running 0 33s
pod/simple1-0-pca-wjfqz 1/1 Running 0 33s
pod/simple1-0-pcd-l4vnk 1/1 Running 0 33s
pod/simple1-0-pcd-s7687 1/1 Running 0 33s
pod/simple1-0-sga-0-pcb-m9shj 1/1 Running 0 33s
pod/simple1-0-sga-0-pcb-vnrqw 1/1 Running 0 33s
pod/simple1-0-sga-0-pcc-g8rg8 1/1 Running 0 33s
pod/simple1-0-sga-0-pcc-hx4zn 1/1 Running 0 33s
Key observation: Notice all pods reached Running state together - that's gang scheduling! Grove ensured all pods were scheduled before starting any of them.
Scale the tightly-coupled components (pcb and pcc) as a group:
kubectl scale pcsg simple1-0-sga --replicas=2Observe the new resources:
kubectl get pcs,pclq,pcsg,pg,pod -o wideWhat happened?
- Grove created a new replica of the scaling group
- Both
pcbandpccscaled together from 2 to 4 pods each - A new
PodGangwas created for gang scheduling the new replica - New
PodCliquesappeared:simple1-0-sga-1-pcbandsimple1-0-sga-1-pcc
Scale back down:
kubectl scale pcsg simple1-0-sga --replicas=1Scale the entire workload to create a complete second instance:
kubectl scale pcs simple1 --replicas=2Check the resources:
kubectl get pcs,pclq,pcsg,pg,pod -o wideWhat happened?
- Grove created a complete duplicate of your workload
- All new resources have
-1-in their names (second replica) - A new
PodGang(simple1-1) was created - All components (pca, pcb, pcc, pcd) were duplicated
Scale back:
kubectl scale pcs simple1 --replicas=1Let's visualize what you just created:
PodCliqueSet (simple1)
├── Replica 0 (PodGang: simple1-0)
│ ├── PodClique: pca (3 pods)
│ ├── PodClique: pcd (2 pods)
│ └── PodCliqueScalingGroup: sga
│ ├── PodClique: pcb (2 pods)
│ └── PodClique: pcc (2 pods)
└── Replica 1 (when scaled to 2)
└── [same structure]
Scaling behaviors:
kubectl scale pcs simple1: Creates/removes complete replicaskubectl scale pcsg simple1-0-sga: Scales just the grouped components (pcb + pcc)- Auto-scaling: Grove can automatically create and manage HPA resources based on the
ScaleConfigdefined at PodClique and PodCliqueScalingGroup levels of a PodCliqueSet, or delegate scaling responsibility to a custom autoscaler, such as Dynamo Planer.
Remove the sample workload:
kubectl delete -f samples/simple/simple1.yamlVerify cleanup:
kubectl get pcs,pclq,pcsg,pg,podOnly the Grove operator pod should remain.
Now that you understand the basics, explore:
- Installation Guide - Learn more about local and remote cluster deployment
- Core Concepts Tutorial - Step-by-step hands-on tutorial on Grove application development
- API Reference - Deep dive into all configuration options
- Samples - Explore more examples
| Concept | What It Does | When to Use |
|---|---|---|
| PodCliqueSet | Top-level resource defining your workload | Every Grove deployment |
| PodClique | Group of pods with the same role | Each component type in your system |
| PodCliqueScalingGroup | Multiple PodCliques that scale together | Tightly-coupled components (prefill+decode) |
| PodGang | Ensures all components are co-scheduled | Automatically created by Grove |
- Check PodGang status:
kubectl describe pg simple1-0 - Ensure your cluster has enough resources for gang scheduling
- For kind clusters, install metrics-server (choose one of following methods):
- Use
operator/Makefiletarget
make deploy-addons
- Manual setup
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
- Use
- Add
--kubelet-insecure-tlsto metrics-server deployment for kind
- See the full Troubleshooting Guide
- Join the Grove mailing list
- File an issue