Skip to content

[GREP] scheduler plugin - KAI scheduler#553

Open
daisy-ycguo wants to merge 1 commit into
ai-dynamo:mainfrom
daisy-ycguo:grep
Open

[GREP] scheduler plugin - KAI scheduler#553
daisy-ycguo wants to merge 1 commit into
ai-dynamo:mainfrom
daisy-ycguo:grep

Conversation

@daisy-ycguo
Copy link
Copy Markdown

@daisy-ycguo daisy-ycguo commented Apr 27, 2026

Adds GREP-525 proposal for KAI scheduler plugin.

Fixes #525

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@daisy-ycguo daisy-ycguo requested a review from danbar2 as a code owner May 6, 2026 03:10
@daisy-ycguo daisy-ycguo changed the title [WIP] [GREP] scheduler plugin - KAI scheduler [GREP] scheduler plugin - KAI scheduler May 6, 2026
@daisy-ycguo daisy-ycguo force-pushed the grep branch 2 times, most recently from a663542 to db3c977 Compare May 6, 2026 03:14
@kangclzjc
Copy link
Copy Markdown
Contributor

@daisy-ycguo Could you please create an issue and link it in this PR

Comment thread docs/proposals/525-KAI-Scheduler-Backend/README.md Outdated
Comment thread docs/proposals/525-KAI-Scheduler-Backend/README.md
Comment thread docs/proposals/525-KAI-Scheduler-Backend/README.md
Comment thread docs/proposals/525-KAI-Scheduler-Backend/README.md
Comment thread docs/proposals/525-KAI-Scheduler-Backend/README.md Outdated
Comment thread docs/proposals/525-KAI-Scheduler-Backend/README.md Outdated
Signed-off-by: Daisy Guo <daiguo@nvidia.com>

Grove will ship a built-in `kai-scheduler` backend that implements the Scheduler Backend Framework lifecycle hooks needed to manage KAI PodGroups. The backend is responsible for converting Grove PodGang intent to KAI PodGroup resources, preparing Pods to use KAI, participating in admission validation, and keeping KAI PodGroups in sync with Grove lifecycle events.

This proposal only covers KAI PodGroup creation and management. It does not propose any KAI Topology creation/update flow, does not add startup-time topology synchronization, and does not define topology-aware scheduling behavior.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KAI scheduler backend: sync PodGang to KAI PodGroup

3 participants