Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,28 +17,30 @@ Grove's naming scheme serves two critical purposes:
For PodCliques that are **not** part of a PodCliqueScalingGroup, the pod naming follows this pattern:

```
<pcs-name>-<pcs-replica-index>-<pclq-name>-<random-suffix>
<pcs-name>-<pcs-replica-index>-<pclq-name>-<pod-index>-<random-suffix>
```

**Components:**
- `<pcs-name>`: The name of the PodCliqueSet
- `<pcs-replica-index>`: The replica index of the PodCliqueSet (0-based)
- `<pclq-name>`: The name of the PodClique template defined in the PodCliqueSet spec
- `<random-suffix>`: A random 5-character suffix generated by Kubernetes
- `<pod-index>`: The pod index within the PodClique (0-based)
- `<random-suffix>`: A random 5-character suffix generated by Grove

**Example:** `multinode-disaggregated-0-frontend-a7b3c`
**Example:** `multinode-disaggregated-0-frontend-0-a7b9x`

Looking at this name, you can immediately tell:
- It belongs to the `multinode-disaggregated` PodCliqueSet
- It's part of PodCliqueSet replica 0
- It's from the `frontend` PodClique
- It is pod index 0 within the `frontend` PodClique

### PodCliques in a PodCliqueScalingGroup

For PodCliques that **are** part of a PodCliqueScalingGroup, the pod naming includes the PCSG information:

```
<pcs-name>-<pcs-replica-index>-<pcsg-name>-<pcsg-replica-index>-<pclq-name>-<random-suffix>
<pcs-name>-<pcs-replica-index>-<pcsg-name>-<pcsg-replica-index>-<pclq-name>-<pod-index>-<random-suffix>
```

**Components:**
Expand All @@ -47,39 +49,42 @@ For PodCliques that **are** part of a PodCliqueScalingGroup, the pod naming incl
- `<pcsg-name>`: The name of the PodCliqueScalingGroup template
- `<pcsg-replica-index>`: The replica index of the PodCliqueScalingGroup (0-based)
- `<pclq-name>`: The name of the PodClique template within the PCSG
- `<random-suffix>`: A random 5-character suffix generated by Kubernetes
- `<pod-index>`: The pod index within the PodClique (0-based)
- `<random-suffix>`: A random 5-character suffix generated by Grove

**Example:** `multinode-disaggregated-0-prefill-1-pworker-m9n0o`
**Example:** `multinode-disaggregated-0-prefill-1-pworker-2-m9n2q`

Looking at this name, you can immediately tell:
- It belongs to the `multinode-disaggregated` PodCliqueSet (replica 0)
- It's part of the `prefill` PodCliqueScalingGroup (replica 1)
- It's from the `pworker` PodClique (prefill worker)
- It's from the `pworker` PodClique (prefill worker, replica 2)

## Naming Best Practices

### Kubernetes Name Length Limit
### Generated Pod Name and Hostname Length Limits

Kubernetes has a **63-character limit** for pod names. Since Grove constructs full pod names by combining multiple components, you need to be mindful of name lengths when choosing names for your resources.
Grove validates the deterministic pod hostname and the generated pod object name separately. The pod hostname is used for service discovery and must stay within the Kubernetes **63-character DNS label limit**. The pod object name includes an additional 5-character suffix and can be longer than 63 characters because Kubernetes treats pod names as DNS subdomains.

**How Grove constructs names:**

For standalone PodCliques, the final pod name is:
```
<pcs-name>-<pcs-replica-idx>-<pclq-name>-<5-char-suffix>
<pcs-name>-<pcs-replica-idx>-<pclq-name>-<pod-index>-<5-char-suffix>
```

For PodCliques in a PCSG, the final pod name is:
```
<pcs-name>-<pcs-replica-idx>-<pcsg-name>-<pcsg-replica-idx>-<pclq-name>-<5-char-suffix>
<pcs-name>-<pcs-replica-idx>-<pcsg-name>-<pcsg-replica-idx>-<pclq-name>-<pod-index>-<5-char-suffix>
```

**Character budget breakdown:**
- `<5-char-suffix>`: 5 characters (fixed by Kubernetes)
- `-` separators: 3-5 characters depending on structure
- Replica indices: 1+ characters each (single digit for 0-9, two digits for 10-99, etc.)
- `<5-char-suffix>`: 5 characters
- `-` separators: 4-6 characters depending on structure
- Replica and pod indices: 1+ characters each (single digit for 0-9, two digits for 10-99, etc.)
- Your chosen names: Remaining characters

When planning names, make sure the generated hostname fits within 63 characters. Grove allows the final pod object name to exceed 63 characters, but users should use Grove's environment variables and hostname-based DNS names for discovery rather than constructing DNS names from `metadata.name`.

### Naming Guidelines

1. **Use Short, Descriptive Names**: Choose concise but meaningful names
Expand All @@ -95,8 +100,8 @@ For PodCliques in a PCSG, the final pod name is:
- ✅ Good: `ml-inference`, `web-app`, `data-pipeline`
- ❌ Avoid: `machine-learning-inference-service`, `web-application-stack`

4. **Plan for Scaling**: Consider whether you'll need double-digit replica indices (adds 1 character per additional digit)
- If you plan to scale to 10+ or 100+ or 1000+ replicas, budget accordingly
4. **Plan for Scaling**: Consider whether you'll need double-digit PCS replica, PCSG replica, or PodClique pod indices (adds 1 character per additional digit)
- If you plan to scale to 10+ or 100+ or 1000+ replicas at any level, budget accordingly

5. **Unique PodClique Names Within a PodCliqueSet**: All PodClique names must be unique within a PodCliqueSet. We explain the rationale for this further in the [Hands-On Example](./03_hands-on-example.md#why-unique-podclique-names-matter).
- If you have leader/worker patterns in multiple PCSGs, you **must** use different names (e.g., `pleader`/`pworker` and `dleader`/`dworker`)
Expand All @@ -123,18 +128,18 @@ Let's plan names for a multi-node disaggregated inference system with a frontend
- Worker PodClique: `dworker` (7 chars)

**Resulting pod names:**
- Frontend: `mn-disagg-0-frontend-a7b3c` (26 chars) ✅
- Prefill leader: `mn-disagg-0-prefill-0-pleader-a7b3c` (35 chars) ✅
- Prefill worker: `mn-disagg-0-prefill-0-pworker-a7b3c` (35 chars) ✅
- Decode leader: `mn-disagg-0-decode-0-dleader-a7b3c` (34 chars) ✅
- Decode worker: `mn-disagg-0-decode-0-dworker-a7b3c` (34 chars) ✅
- Frontend: `mn-disagg-0-frontend-0-a7b9x` (28 chars) ✅
- Prefill leader: `mn-disagg-0-prefill-0-pleader-0-a7b9x` (37 chars) ✅
- Prefill worker: `mn-disagg-0-prefill-0-pworker-0-a7b9x` (37 chars) ✅
- Decode leader: `mn-disagg-0-decode-0-dleader-0-a7b9x` (36 chars) ✅
- Decode worker: `mn-disagg-0-decode-0-dworker-0-a7b9x` (36 chars) ✅

**Scaling headroom:** The longest name (`mn-disagg-0-prefill-0-pworker-a7b3c`) is 35 characters, leaving 28 characters of headroom. Each additional digit in a replica index adds 1 character:
- 2-digit indices for PCS and PCSG (10-99): 37 chars → scales to 99 PCS replicas × 99 PCSG replicas
- 3-digit indices for PCS and PCSG (100-999): 39 chars → scales to 999 × 999 replicas
- 7-digit indices for PCS and PCSG: 47 chars → scales to millions of replicas
**Scaling headroom:** The longest hostname (`mn-disagg-0-prefill-0-pworker-0`) is 31 characters, leaving 32 characters of hostname headroom. Each additional digit in a PCS replica, PCSG replica, or PodClique pod index adds 1 character:
- 2-digit indices for PCS, PCSG, and PodClique pods (10-99): 34 chars ✅
- 3-digit indices for PCS, PCSG, and PodClique pods (100-999): 37 chars ✅
- 7-digit indices for all three levels: 49 chars ✅

With these name choices, you could scale to millions of replicas on both dimensions without hitting the limit. All names are well under the 63-character limit with room for scaling growth while remaining descriptive!
With these name choices, you could scale to millions of replicas across PCS replicas, PCSG replicas, and PodClique pods without hitting the hostname limit. All hostnames are well under the 63-character limit with room for scaling growth while remaining descriptive!

To deploy a PodCliqueSet with this structure and explore the naming hierarchy through `kubectl`, continue to the [Hands-On Example](./03_hands-on-example.md).

Expand All @@ -152,8 +157,8 @@ To deploy a PodCliqueSet with this structure and explore the naming hierarchy th
| **PodClique (resource, standalone)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pclq-name>` |
| **PodClique (resource, in PCSG)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pcsg-name>-<pcsg-idx>-<pclq-name>` |
| **PCSG (resource)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pcsg-name>` |
| **Pod (standalone)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pclq-name>-<suffix>` |
| **Pod (in PCSG)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pcsg-name>-<pcsg-idx>-<pclq-name>-<suffix>` |
| **Pod (standalone)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pclq-name>-<pod-idx>-<suffix>` |
| **Pod (in PCSG)** | - | ✅ | `<pcs-name>-<pcs-idx>-<pcsg-name>-<pcsg-idx>-<pclq-name>-<pod-idx>-<suffix>` |

**You control:** PodCliqueSet name, PodClique template names, PCSG template names
**Grove generates:** All resource instances with hierarchical naming
Expand All @@ -162,7 +167,7 @@ To deploy a PodCliqueSet with this structure and explore the naming hierarchy th

1. **Self-Documenting Hierarchy**: Pod names encode the complete hierarchy from PodCliqueSet → PCSG (if applicable) → PodClique → Pod, making `kubectl get pods` output immediately understandable.

2. **63-Character Limit**: Kubernetes enforces a 63-character limit on resource names. Use short, meaningful names for your resources, especially PodCliqueSet and PCSG names which appear in every generated name.
2. **Hostname Limit**: Generated pod hostnames must stay within the Kubernetes 63-character DNS label limit. Use short, meaningful names for your resources, especially PodCliqueSet and PCSG names which appear in every generated hostname and pod name.

3. **Unique PodClique Names**: All PodClique names must be unique within a PodCliqueSet. When you have similar roles in multiple PCSGs (e.g., leader/worker in both prefill and decode), use prefixes or abbreviations (e.g., `pleader`/`pworker` and `dleader`/`dworker`).

Expand All @@ -177,4 +182,3 @@ Now that you understand Grove's naming scheme and best practices:
- **See it in action**: Continue to the [Hands-On Example](./03_hands-on-example.md) to deploy an example system and observe the naming hierarchy firsthand.

- **Learn programmatic discovery**: Head to the [Environment Variables guide](../03_environment-variables-for-pod-discovery/01_overview.md) to learn how to use these names programmatically for pod discovery, including how Grove injects environment variables and how to construct FQDNs for pod-to-pod communication.

Original file line number Diff line number Diff line change
Expand Up @@ -161,20 +161,20 @@ kubectl get pods -l app.kubernetes.io/part-of=mn-disagg -o wide

You should see output like:
```
NAME READY STATUS RESTARTS AGE
mn-disagg-0-decode-0-dleader-abc12 1/1 Running 0 45s
mn-disagg-0-decode-0-dworker-def34 1/1 Running 0 45s
mn-disagg-0-decode-0-dworker-ghi56 1/1 Running 0 45s
mn-disagg-0-frontend-jkl78 1/1 Running 0 45s
mn-disagg-0-frontend-mno90 1/1 Running 0 45s
mn-disagg-0-prefill-0-pleader-pqr12 1/1 Running 0 45s
mn-disagg-0-prefill-0-pworker-stu34 1/1 Running 0 45s
mn-disagg-0-prefill-0-pworker-vwx56 1/1 Running 0 45s
mn-disagg-0-prefill-0-pworker-yza78 1/1 Running 0 45s
mn-disagg-0-prefill-1-pleader-bcd90 1/1 Running 0 45s
mn-disagg-0-prefill-1-pworker-efg12 1/1 Running 0 45s
mn-disagg-0-prefill-1-pworker-hij34 1/1 Running 0 45s
mn-disagg-0-prefill-1-pworker-klm56 1/1 Running 0 45s
NAME READY STATUS RESTARTS AGE
mn-disagg-0-decode-0-dleader-0-abc7d 1/1 Running 0 45s
mn-disagg-0-decode-0-dworker-0-def8e 1/1 Running 0 45s
mn-disagg-0-decode-0-dworker-1-ghi9f 1/1 Running 0 45s
mn-disagg-0-frontend-0-jkl0g 1/1 Running 0 45s
mn-disagg-0-frontend-1-mno1h 1/1 Running 0 45s
mn-disagg-0-prefill-0-pleader-0-pqr2i 1/1 Running 0 45s
mn-disagg-0-prefill-0-pworker-0-stu3j 1/1 Running 0 45s
mn-disagg-0-prefill-0-pworker-1-vwx4k 1/1 Running 0 45s
mn-disagg-0-prefill-0-pworker-2-yza5l 1/1 Running 0 45s
mn-disagg-0-prefill-1-pleader-0-bcd6m 1/1 Running 0 45s
mn-disagg-0-prefill-1-pworker-0-efg7n 1/1 Running 0 45s
mn-disagg-0-prefill-1-pworker-1-hij8p 1/1 Running 0 45s
mn-disagg-0-prefill-1-pworker-2-klm9x 1/1 Running 0 45s
```

## Parsing the Naming Hierarchy
Expand All @@ -185,15 +185,15 @@ Looking at this output, you can immediately understand the system structure:
```
mn-disagg-0-frontend-*
```
- Simpler naming: `<pcs>-<pcs-idx>-<pclq>-<suffix>`
- Simpler naming: `<pcs>-<pcs-idx>-<pclq>-<pod-idx>-<suffix>`
- 2 frontend pods serving requests

**2. PodCliqueScalingGroup (prefill) - 2 replicas:**
```
mn-disagg-0-prefill-0-*
mn-disagg-0-prefill-1-*
```
- Deeper hierarchy: `<pcs>-<pcs-idx>-<pcsg>-<pcsg-idx>-<pclq>-<suffix>`
- Deeper hierarchy: `<pcs>-<pcs-idx>-<pcsg>-<pcsg-idx>-<pclq>-<pod-idx>-<suffix>`
- Each replica has 1 `pleader` + 3 `pworker` pods
- Two independent prefill clusters

Expand Down Expand Up @@ -255,12 +255,13 @@ The PCSG names clearly identify the two scaling groups.

## Name Length Analysis

Pod names have a 63-character limit (DNS label constraint). Let's verify our longest pod name fits:
Grove pod hostnames have a 63-character DNS label limit. Pod object names include an extra 5-character suffix and can be longer than 63 characters, but hostname length is the limit that matters for service discovery. Let's verify our longest generated hostname fits:

- Longest pod name: `mn-disagg-0-prefill-1-pworker-klm56`
- Characters: 35 (well under 63) ✅
- Longest pod name: `mn-disagg-0-prefill-1-pworker-2-klm9x`
- Longest hostname: `mn-disagg-0-prefill-1-pworker-2`
- Characters: 31 (well under 63) ✅

With 28 characters of headroom, this naming scheme can scale to millions of replicas on both PCS and PCSG dimensions without hitting the limit.
With 32 characters of hostname headroom, this naming scheme can scale to millions of replicas across the PCS, PCSG, and PodClique pod dimensions without hitting the limit.

## Cleanup

Expand All @@ -275,4 +276,3 @@ kubectl delete pcs mn-disagg
Now that you've seen the naming conventions in action, check out:
- The [Key Takeaways](./02_naming-conventions.md#key-takeaways) section for a summary of naming best practices
- The [Environment Variables guide](../03_environment-variables-for-pod-discovery/01_overview.md) to learn how to use these names programmatically for pod discovery

Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ A common source of confusion is the difference between a pod's **name** and its

### Pod Name (Kubernetes Resource Identifier)

The **pod name** is the unique identifier for the Pod resource in Kubernetes (stored in `metadata.name`). When Grove creates pods, it uses Kubernetes' `generateName` feature, which appends a random 5-character suffix to ensure uniqueness:
The **pod name** is the unique identifier for the Pod resource in Kubernetes (stored in `metadata.name`). When Grove creates pods, it includes the PodClique pod index and appends a random 5-character suffix to ensure uniqueness:

```
<pclq-name>-<random-suffix>
Example: env-demo-standalone-0-frontend-abc12
<pclq-name>-<pod-index>-<random-suffix>
Example: env-demo-standalone-0-frontend-0-abc7d
```

This name is what you see when running `kubectl get pods`. However, you **cannot use this name for DNS-based pod discovery** because the random suffix is unpredictable.
Expand Down Expand Up @@ -53,7 +53,7 @@ env-demo-standalone-0-frontend-0.env-demo-standalone-0.default.svc.cluster.local
| Attribute | Pod Name | Hostname |
|-----------|----------|----------|
| Source | `metadata.name` | `spec.hostname` |
| Pattern | `<pclq-name>-<random-suffix>` | `<pclq-name>-<pod-index>` |
| Pattern | `<pclq-name>-<pod-index>-<random-suffix>` | `<pclq-name>-<pod-index>` |
| Predictable? | ❌ No (random suffix) | ✅ Yes (index-based) |
| DNS resolvable? | ❌ No | ✅ Yes (with headless service) |
| Use case | `kubectl` commands, logs | Pod discovery, pod-to-pod communication |
Expand Down Expand Up @@ -100,4 +100,3 @@ If a pod belongs to a PodClique that is part of a PodCliqueScalingGroup, these a
## Next Steps

Continue to the [Hands-On Examples](./03_hands-on-examples.md) to deploy example PodCliqueSets and use environment variables to construct FQDNs and discover other pods. We strongly recommend working through these examples as they demonstrate the practical techniques you'll need to implement pod discovery in your own applications.

Loading
Loading