Skip to content

GREP-531 Workload API gang scheduling for default-scheduler backend#605

Open
yankay wants to merge 1 commit into
ai-dynamo:mainfrom
yankay:grep/531-kube-scheduler-workload-gang
Open

GREP-531 Workload API gang scheduling for default-scheduler backend#605
yankay wants to merge 1 commit into
ai-dynamo:mainfrom
yankay:grep/531-kube-scheduler-workload-gang

Conversation

@yankay
Copy link
Copy Markdown
Contributor

@yankay yankay commented May 12, 2026

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Introduces GREP-531: Workload API Gang Scheduling for the default-scheduler backend.

This GREP designs the implementation of gang scheduling via upstream scheduling.k8s.io Workload / PodGroup resources for Grove's existing default-scheduler backend (introduced in GREP-375). Key decisions:

  • Phase 1 (Kubernetes 1.35, v1alpha1): flat Workload/PodGroup mapping, Pod.Spec.WorkloadRef membership, immutable gang shape.
  • Phase 2 (Kubernetes 1.36+, conditional on KEP-5832): decoupled standalone PodGroup, Pod.Spec.SchedulingGroup membership.
  • Phase 3 (Kubernetes 1.37+): CompositePodGroup hierarchical gang, TAS.

Design aligns with GREP-375 Beta criterion, LWS KEP-666 lifecycle invariants, and KEP-4671 upstream gang API.

Also renames the proposal directory from 531-kube-scheduler-workload-gang to 531-default-scheduler-workload-gang to match the backend name.

Supersedes the draft implementation in #532 — design decisions here take precedence where they differ.

Which issue(s) this PR fixes:

Fixes #531

Special notes for your reviewer:

This is a design/GREP-only PR; no code changes. The companion implementation PR will follow once this GREP is accepted.

Does this PR introduce a API change?

NONE

Additional documentation e.g., enhancement proposals, usage docs, etc.:

docs/proposals/531-default-scheduler-workload-gang/README.md

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5a8035adee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/proposals/531-default-scheduler-workload-gang/README.md Outdated
@yankay yankay changed the title docs(grep): add GREP-531 Workload API gang scheduling for default-scheduler backend GREP-531 Workload API gang scheduling for default-scheduler backend May 12, 2026
@yankay yankay force-pushed the grep/531-kube-scheduler-workload-gang branch from 5a8035a to 85a4af2 Compare May 12, 2026 09:20
@yankay yankay force-pushed the grep/531-kube-scheduler-workload-gang branch 2 times, most recently from c9ecae0 to 89988e7 Compare May 12, 2026 10:15
@yankay yankay force-pushed the grep/531-kube-scheduler-workload-gang branch from 89988e7 to a2ef915 Compare May 12, 2026 11:34
|---|---|
| Upstream KEPs at very different maturity (full table in [Appendix](#upstream-kubernetes-keps)); load-bearing unreleased ones are [KEP-6012][kep-6012] (hierarchical) and [KEP-5732][kep-5732] (TAS). [KEP-4671][kep-4671] is still alpha (`beta: v1.37`, `stable: v1.38`), and v1alpha2 — the only live API version — ships in Kubernetes 1.36 but is not yet in Grove's `k8s.io/api` baseline. | Target v1alpha2 directly once Grove's modules adopt the v0.36.x baseline (skipping the abandoned v1alpha1), and align with sibling integrations on the same upstream APIs (see [Appendix](#sibling-integrations)) so divergence is intentional. |
| [KEP-4671 §PodGroup Creation Ordering][kep-4671] requires `Workload` → `PodGroup` → `Pod` creation order; pods created before the `PodGroup` are marked `UnschedulableAndUnresolvable` and re-enqueued when the `PodGroup` appears. | Backend creates gang resources in `SyncPodGang` **before** `PodGang.Initialized=True`; pods stay scheduling-gated, so the race window is harmless. Owner is `PodGang`, not a Pod. |
| Upstream marks gang shape fields immutable: `Workload.Spec.PodGroupTemplates`, `PodGroupTemplate.Name`, `PodGroup.Spec.SchedulingPolicy`, and `Pod.Spec.SchedulingGroup` are all immutable in v1alpha2. | Grove rejects in-place updates to gang shape and requires recreating the Grove workload; future phases may relax this only if a later upstream version explicitly marks the fields mutable. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently grove create an empty PodGang and then create Pods and then Pods prepared Pod spec. Currently it wouldn't be a problem because the PodGroupTemplate.Name are matched with grove PodGroup. @unmarshall , could you also please take a look of this

…eduler backend

Introduces GREP-531: Workload API Gang Scheduling for the default-scheduler
backend. Key design decisions:

- Phase 1 (Kubernetes 1.35, v1alpha1): flat Workload/PodGroup mapping,
  Pod.Spec.WorkloadRef membership, immutable gang shape with recreate-workload
  semantics.
- Phase 2 (Kubernetes 1.36+, conditional on KEP-5832): decoupled standalone
  PodGroup, Pod.Spec.SchedulingGroup membership.
- Phase 3 (Kubernetes 1.37+): CompositePodGroup hierarchical gang, TAS.

Also renames the directory from 531-kube-scheduler-workload-gang to
531-default-scheduler-workload-gang to match the backend name.

Relates to ai-dynamo#531, ai-dynamo#395

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
@yankay yankay force-pushed the grep/531-kube-scheduler-workload-gang branch from 7215e40 to 3cbbea4 Compare May 18, 2026 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gang support in Workload API support for kube-scheduler

2 participants