Skip to content

feat: add pod clique pod index to pod names#634

Open
steved wants to merge 1 commit into
ai-dynamo:mainfrom
steved:steved/pod-name-pod-index
Open

feat: add pod clique pod index to pod names#634
steved wants to merge 1 commit into
ai-dynamo:mainfrom
steved:steved/pod-name-pod-index

Conversation

@steved
Copy link
Copy Markdown
Contributor

@steved steved commented May 28, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

For better at-a-glance visibility, we'd like to encode the PCLQ pod index in the pod name.

Which issue(s) this PR fixes:

Fixes #635

Special notes for your reviewer:

  • This moves the pod name generation out of GenerateName and into Grove. Rather than appending a 5-char suffix, we can rely on the randomness of 3 characters since pod names can only conflict with their same index.
  • This tightens the validation to check names against replicas. Previously, if a 45 char name had many replicas it could exceed the hostname limit of 63 characters.
    • A PCS that was previously valid may now be invalid, but that should be theoretically impossible since the pod hostnames are invalid.
  • Pod names can now go up to 69 chars, which is allowed (though discouraged), since we're explicitly setting the hostname and subdomain.
  • Note: there is currently no validation on /scale API calls

Does this PR introduce a API change?

Pod names now contain the pod clique pod index and may exceed the 63-character DNS limit

Additional documentation e.g., enhancement proposals, usage docs, etc.:

N/A

@steved steved force-pushed the steved/pod-name-pod-index branch 2 times, most recently from ab1c617 to d161433 Compare May 30, 2026 02:41
@steved steved changed the title feat: add pod index to pod name prefix feat: add pod clique index to created pod names May 30, 2026
@steved steved force-pushed the steved/pod-name-pod-index branch from d161433 to 814344e Compare May 30, 2026 02:42
@steved steved force-pushed the steved/pod-name-pod-index branch from 814344e to 02d199a Compare May 30, 2026 02:44
@steved steved changed the title feat: add pod clique index to created pod names feat: add pod clique pod index to pod names May 30, 2026
@steved steved marked this pull request as ready for review May 30, 2026 02:44
if len(podGroup.PodReferences) == 0 {
continue
}
podRefName := podGroup.PodReferences[0].Name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The podgroup name is already the fully qualified podclique name (since a09e370) so I think it was just an oversight that this still parses the pod references. Happy to back it out.

resourceNameLength := len(pcsName) + len(pclqName)
// <pcs-name>-<pcs-index>-<pclq-name>-<pod-index>-<random>
func validatePodNameConstraints(pcsName string, pcsReplicas int32, pcsgName string, pcsgReplicas int32, pclqName string, pclqReplicas int32) error {
pclqOwnerNameReplica := apicommon.ResourceNameReplica{Name: pcsName, Replica: int(max(0, pcsReplicas-1))}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you using the max between 0 to max replicas instead of the exact pcs replica index? same question for all usages of max replicas in the ResourceNameReplica builder

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these could be removed. Podclique replicas must be >= 1, but PCS replicas can be 0. This was to ensure we validate the exact index (replicas: 10 creates PCS 0 - 9) and we don't validate against -1 if replicas: 0 is set.


// MaxReplicas returns the maximum number of replicas for this PodClique.
func (p PodCliqueSpec) MaxReplicas() int32 {
replicas := p.Replicas
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small question — this is a bit asymmetric with
PodCliqueScalingGroupConfig.MaxReplicas() below, which floors at 1 via
max(1, ptr.Deref(p.Replicas, 0)). Is that intentional (e.g. zero replicas is
a valid scaled-to-zero state for a PodClique), or would it be worth flooring
here too for consistency? Happy either way — just wanted to flag it. Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, they do have slightly different behaviors. 0 isn't valid for replicas and the field is required, but the default for scaling configs is only set in the API so maybe we just trust that and deref it directly.

if *scalingGroupConfig.Replicas <= 0 {

// +kubebuilder:default=1

@renormalize renormalize self-assigned this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add pod clique pod index to pod name for better visibility

4 participants