fix: include pod clique pod index in pod name#643
Closed
AsadShahid04 wants to merge 2 commits into
Closed
Conversation
Pod names are generated using GenerateName, which appends a random suffix. Before this change, two pods for the same PodClique were named e.g. ubuntu-0-worker-2tnab and ubuntu-0-worker-5mfde, making it impossible to correlate the hostname seen in logs with a specific pod without querying custom columns. Include the per-pod index (LabelPodCliquePodIndex) directly in the GenerateName prefix so that names become ubuntu-0-worker-0-2tnab and ubuntu-0-worker-1-5mfde. This mirrors the existing convention used for PCS and PCSG replica indices and requires no label lookup to identify which pod is which. Closes ai-dynamo#635 Signed-off-by: OpenClaw Agent <agent@openclaw.local>
Pod names now have the format <pclqName>-<podIndex>-<k8sRandom> after the previous commit added the pod clique index to GenerateName. The helper function extractPCLQNameFromPodName only stripped one trailing segment, returning <pclqName>-<podIndex> instead of <pclqName>, which broke the PodGang→PodClique reconcile mapping and left all pods permanently ScheduleGated. Strip two segments (random suffix then pod index) to correctly recover the PodClique FQN. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: OpenClaw Agent <agent@openclaw.local>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GenerateNameinbuildResourceto embed the pod index in the pod name prefix (e.g.ubuntu-0-worker-0-<suffix>instead ofubuntu-0-worker-<suffix>)TestGetLabels_PodIndexLabelandTestPodGenerateNameIncludesPodIndexProblem
When a pod failure occurs, the hostname in logs (e.g.
ubuntu-0-worker) identifies the PodClique but not which replica failed. Users must resort to customkubectl get pods -o custom-columnsqueries to map pod names to their index, which doesn't work with standard options like-o wide.Solution
Include the pod's clique pod index in the
GenerateNameprefix so that Kubernetes-assigned names encode the index directly:The pod index is already tracked as a label (
grove.io/podclique-pod-index) and used for hostname and env var injection — this change surfaces it in the pod name itself, consistent with how PCS and PCSG replica indices are already embedded in resource names.Testing
go build ./...: passedgo test ./internal/controller/podclique/components/pod/...: passedTestGetLabels_PodIndexLabel,TestPodGenerateNameIncludesPodIndexCloses #635