Question: how will Grove support intra-node NUMA-aware scheduling?

Hi maintainers, thanks for GREP-244.

I noticed `TopologyDomainNuma = "numa"` is defined in the API and [Story 3 (NUMA-Aware GPU Benchmarking)](https://github.com/ai-dynamo/grove/tree/main/docs/proposals/244-topology-aware-scheduling#story-3-numa-aware-gpu-benchmarking) is listed as a motivating user story, but I cannot find any controller/scheduler logic that consumes it.

The current `ClusterTopologyBinding` model requires each domain to map to a Node label key. That works for `rack`/`zone`/`host` (one Node → one label value), but NUMA sockets live *inside* a Node and are never exposed as Node labels. So Story 3 ("2 GPUs from an 8-GPU node on the same NUMA node") does not seem reachable through node-label pack alone.

Related context:

- [ai-dynamo/dynamo#10171](https://github.com/ai-dynamo/dynamo/issues/10171) — NUMA-aware placement discussion.
- [KEP-4381 Partial GPU Allocation](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/4381-dra-structured-parameters/README.md#partial-gpu-allocation).
- [kubernetes/kubernetes#132296](https://github.com/kubernetes/kubernetes/pull/132296) — standardized `resource.kubernetes.io/pcieRoot` DRA attribute for intra-node GPU↔NIC alignment.

**Question**: which direction is planned?

1. **DRA path** — rely on upstream DRA (`pcieRoot`/NUMA attributes) ; Grove stays at the node-domain layer and possibly injects ResourceClaim selectors (similar to auto-MNNVL).
2. **Scheduler path** — push intra-node NUMA awareness into the scheduler backend (e.g. KAI, see [kai-scheduler/KAI-Scheduler#1598](https://github.com/kai-scheduler/KAI-Scheduler/issues/1598),[Volcano numa-aware](https://github.com/volcano-sh/volcano/blob/master/docs/design/numa-aware.md)).
3. Others


Happy to help with a prototype or follow-up GREP once the direction is clear. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: how will Grove support intra-node NUMA-aware scheduling? #644

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question: how will Grove support intra-node NUMA-aware scheduling? #644

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions