feat(BA-4904): add GraphQL ResourceSlotType node with root queries and connections by HyeockJinKim · Pull Request #9708 · lablup/backend.ai

HyeockJinKim · 2026-03-05T15:54:15Z

Summary

Add ResourceSlotTypeGQL node exposing all resource_slot_types table columns (slot_name, slot_type, display_name, description, display_unit, display_icon, number_format, rank)
Add root queries resource_slot_type(slot_name) and resource_slot_types(filter, order, pagination) with Connection support
Add AgentResourceSlotGQL node and resource_slots connection field on AgentV2GQL
Add KernelResourceAllocationGQL node and resource_allocations connection field on KernelV2GQL
Shared fetcher pattern in api/gql/resource_slot/fetcher.py reused across root queries and connection resolvers

Test plan

ResourceSlotTypeGQL node exposes all resource_slot_types columns
Root query resource_slot_types returns Connection with filter/order/pagination
Root query resource_slot_type(slot_name) returns single node or null
AgentV2GQL has resource_slots field returning AgentResourceSlotConnectionGQL
KernelV2GQL has resource_allocations field returning KernelResourceAllocationConnectionGQL
Fetcher functions shared (no duplication) between root queries and connection resolvers
All types registered in GQL schema and queryable via introspection
pants lint and pants check pass

Resolves BA-4904

📚 Documentation preview 📚: https://sorna--9708.org.readthedocs.build/en/9708/

📚 Documentation preview 📚: https://sorna-ko--9708.org.readthedocs.build/ko/9708/

Copilot

Pull request overview

Adds new GraphQL surface area for resource slot metadata and per-entity slot usage/allocation, backed by new service actions and shared fetcher helpers.

Changes:

Introduces ResourceSlotTypeGQL (+ NumberFormat) and root queries resource_slot_type / resource_slot_types.
Adds Relay-style connection fields on AgentV2GQL (resource_slots) and KernelV2GQL (resource_allocations).
Extends resource-slot service/processors/actions to support fetching slot-type registry entries.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/ai/backend/manager/services/resource_slot/service.py	Adds service methods to fetch all slot types and a single slot type, mapping repository rows into data objects.
src/ai/backend/manager/services/resource_slot/processors.py	Registers new action processors for slot-type actions.
src/ai/backend/manager/services/resource_slot/actions/get_slot_type.py	Adds action/result for fetching a single slot type.
src/ai/backend/manager/services/resource_slot/actions/all_slot_types.py	Adds action/result for fetching all slot types.
src/ai/backend/manager/services/resource_slot/actions/init.py	Exposes new actions/results via package exports.
src/ai/backend/manager/data/resource_slot/types.py	Adds `NumberFormatData` and extends `ResourceSlotTypeData` with additional fields.
src/ai/backend/manager/api/gql/schema.py	Wires new root query resolvers into the GraphQL schema.
src/ai/backend/manager/api/gql/resource_slot/types.py	Introduces new GraphQL Node + Connection types for slot types, agent resources, and kernel allocations.
src/ai/backend/manager/api/gql/resource_slot/resolver.py	Adds root query resolvers for slot type queries.
src/ai/backend/manager/api/gql/resource_slot/fetcher.py	Adds shared fetcher utilities for root queries and connection fields.
src/ai/backend/manager/api/gql/resource_slot/init.py	Adds package marker.
src/ai/backend/manager/api/gql/kernel/types.py	Adds `resource_allocations` connection field on `KernelV2GQL`.
src/ai/backend/manager/api/gql/agent/types.py	Adds `resource_slots` connection field on `AgentV2GQL`.
src/ai/backend/common/data/permission/types.py	Adds new RBAC entity type `RESOURCE_SLOT_TYPE`.
changes/9708.feature.md	Adds changelog entry for the new GraphQL nodes/queries/fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T15:57:17Z

src/ai/backend/manager/api/gql/resource_slot/resolver.py

+
+@strawberry.field(description="Added in 26.4.0. Returns all registered resource slot types.")  # type: ignore[misc]
+async def resource_slot_types(
+    info: Info[StrawberryGQLContext],


This root field returns a Relay Connection but does not accept any pagination arguments (e.g., first/after/last/before). As written, clients cannot paginate and page_info is always hardcoded in the fetcher; either expose proper Relay pagination parameters (and implement slicing + has_next_page/has_previous_page) or change the API to return a plain list instead of a Connection.

Suggested change

info: Info[StrawberryGQLContext],

info: Info[StrawberryGQLContext],

first: int | None = None,

after: str | None = None,

last: int | None = None,

before: str | None = None,

Copilot · 2026-03-05T15:57:18Z

src/ai/backend/manager/api/gql/resource_slot/fetcher.py

+    page_info = strawberry.relay.PageInfo(
+        has_next_page=False,
+        has_previous_page=False,
+        start_cursor=edges[0].cursor if edges else None,
+        end_cursor=edges[-1].cursor if edges else None,
+    )


These connections are constructed with has_next_page/has_previous_page hardcoded to False, which produces misleading Relay semantics once the dataset grows. If the field is intended to be a real Relay connection, compute these flags based on the requested window (first/after/...) and the underlying total; otherwise consider returning a list type to avoid implying pagination support.

Copilot · 2026-03-05T15:57:18Z

src/ai/backend/manager/api/gql/resource_slot/types.py

+    @classmethod
+    async def resolve_nodes(  # type: ignore[override]
+        cls,
+        *,
+        info: Info[StrawberryGQLContext],
+        node_ids: Iterable[str],
+        required: bool = False,
+    ) -> Iterable[Self | None]:


The required flag is part of Strawberry's Node resolution contract, but it is currently ignored in all three resolve_nodes implementations in this file. When required=True and a node is missing, the resolver should raise (instead of returning None) to match the expected behavior for non-null node lookups.

Copilot · 2026-03-05T15:57:18Z

src/ai/backend/manager/api/gql/resource_slot/types.py

+        results: list[Self | None] = []
+        for slot_name in node_ids:
+            data = await load_resource_slot_type_data(info, slot_name)
+            results.append(cls.from_data(data) if data is not None else None)
+        return results


This performs one awaited fetch per node_id, causing an N+1 pattern for Relay node resolution. Prefer batching: fetch all requested slot_names in one service/repository call (or through a request-scoped DataLoader), then map results back to the original node_ids order.

Copilot · 2026-03-05T15:57:19Z

src/ai/backend/manager/api/gql/resource_slot/fetcher.py

+    """Load raw AgentResourceData for a single agent+slot (used by Node.resolve_nodes)."""
+    action_result = (
+        await info.context.processors.resource_slot.get_agent_resources.wait_for_complete(
+            GetAgentResourcesAction(agent_id=agent_id)
+        )
+    )
+    for data in action_result.items:
+        if data.slot_name == slot_name:
+            return data
+    return None
+
+
+async def load_kernel_allocation_data(
+    info: Info[StrawberryGQLContext],
+    kernel_id_str: str,
+    slot_name: str,
+) -> ResourceAllocationData | None:
+    """Load raw ResourceAllocationData for a single kernel+slot (used by Node.resolve_nodes)."""
+    action_result = (
+        await info.context.processors.resource_slot.get_kernel_allocations.wait_for_complete(
+            GetKernelAllocationsAction(kernel_id=_uuid.UUID(kernel_id_str))
+        )
+    )
+    for data in action_result.items:
+        if data.slot_name == slot_name:
+            return data
+    return None


For per-slot Node resolution you re-fetch the full agent resource list and linearly scan it for every requested slot. This is O(N*M) across multiple nodes and can produce repeated identical backend calls. Consider adding a dedicated service method/action to fetch a single slot (agent_id + slot_name), or batch: fetch once per agent_id and build a dict keyed by slot_name.

Suggested change

"""Load raw AgentResourceData for a single agent+slot (used by Node.resolve_nodes)."""

action_result = (

await info.context.processors.resource_slot.get_agent_resources.wait_for_complete(

GetAgentResourcesAction(agent_id=agent_id)

)

)

for data in action_result.items:

if data.slot_name == slot_name:

return data

return None

async def load_kernel_allocation_data(

info: Info[StrawberryGQLContext],

kernel_id_str: str,

slot_name: str,

) -> ResourceAllocationData | None:

"""Load raw ResourceAllocationData for a single kernel+slot (used by Node.resolve_nodes)."""

action_result = (

await info.context.processors.resource_slot.get_kernel_allocations.wait_for_complete(

GetKernelAllocationsAction(kernel_id=_uuid.UUID(kernel_id_str))

)

)

for data in action_result.items:

if data.slot_name == slot_name:

return data

return None

"""Load raw AgentResourceData for a single agent+slot (used by Node.resolve_nodes).

To avoid repeated backend calls and linear scans when resolving multiple slots for

the same agent within a single request, this function caches the full list of

resources per agent_id on the GraphQL context and indexes them by slot_name.

"""

# Per-request cache: info.context._agent_resources_cache

ctx = info.context

cache = getattr(ctx, "_agent_resources_cache", None)

if cache is None:

cache = {}

setattr(ctx, "_agent_resources_cache", cache)

agent_cache = cache.get(agent_id)

if agent_cache is None:

action_result = (

await ctx.processors.resource_slot.get_agent_resources.wait_for_complete(

GetAgentResourcesAction(agent_id=agent_id)

)

)

# Index resources by slot_name for O(1) lookup.

agent_cache = {item.slot_name: item for item in action_result.items}

cache[agent_id] = agent_cache

return agent_cache.get(slot_name)

async def load_kernel_allocation_data(

info: Info[StrawberryGQLContext],

kernel_id_str: str,

slot_name: str,

) -> ResourceAllocationData | None:

"""Load raw ResourceAllocationData for a single kernel+slot (used by Node.resolve_nodes).

Similar to load_agent_resource_data(), this caches allocations per kernel_id within

a single request to prevent repeated backend calls and linear scans when multiple

slots are resolved for the same kernel.

"""

# Per-request cache: info.context._kernel_allocations_cache

ctx = info.context

cache = getattr(ctx, "_kernel_allocations_cache", None)

if cache is None:

cache = {}

setattr(ctx, "_kernel_allocations_cache", cache)

kernel_cache = cache.get(kernel_id_str)

if kernel_cache is None:

action_result = (

await ctx.processors.resource_slot.get_kernel_allocations.wait_for_complete(

GetKernelAllocationsAction(kernel_id=_uuid.UUID(kernel_id_str))

)

)

# Index allocations by slot_name for O(1) lookup.

kernel_cache = {item.slot_name: item for item in action_result.items}

cache[kernel_id_str] = kernel_cache

return kernel_cache.get(slot_name)

Copilot · 2026-03-05T15:57:19Z

src/ai/backend/manager/services/resource_slot/service.py

+        items = [
+            ResourceSlotTypeData(
+                slot_name=row.slot_name,
+                slot_type=row.slot_type,
+                display_name=row.display_name,
+                description=row.description,
+                display_unit=row.display_unit,
+                display_icon=row.display_icon,
+                number_format=NumberFormatData(
+                    binary=row.number_format.binary,
+                    round_length=row.number_format.round_length,
+                ),
+                rank=row.rank,
+            )


The row→ResourceSlotTypeData mapping logic is duplicated in both all_slot_types() and get_slot_type(). Extract a small private helper (e.g., _to_resource_slot_type_data(row)) to keep the mapping consistent and reduce the chance of future drift when columns are added/changed.

HyeockJinKim · 2026-03-06T01:30:50Z

src/ai/backend/manager/api/gql/resource_slot/fetcher.py

+async def load_agent_resource_data(
+    info: Info[StrawberryGQLContext],
+    agent_id: str,
+    slot_name: str,
+) -> AgentResourceData | None:
+    """Load raw AgentResourceData for a single agent+slot (used by Node.resolve_nodes)."""
+    action_result = (
+        await info.context.processors.resource_slot.get_agent_resources.wait_for_complete(
+            GetAgentResourcesAction(agent_id=agent_id)
+        )
+    )
+    for data in action_result.items:
+        if data.slot_name == slot_name:
+            return data
+    return None


It appears a separate query & service action is needed that accepts input up to the slot_name, not just agent resources.

HyeockJinKim · 2026-03-06T01:31:12Z

src/ai/backend/manager/api/gql/resource_slot/fetcher.py

+async def load_kernel_allocation_data(
+    info: Info[StrawberryGQLContext],
+    kernel_id_str: str,
+    slot_name: str,
+) -> ResourceAllocationData | None:
+    """Load raw ResourceAllocationData for a single kernel+slot (used by Node.resolve_nodes)."""
+    action_result = (
+        await info.context.processors.resource_slot.get_kernel_allocations.wait_for_complete(
+            GetKernelAllocationsAction(kernel_id=_uuid.UUID(kernel_id_str))
+        )
+    )
+    for data in action_result.items:
+        if data.slot_name == slot_name:
+            return data
+    return None


It seems we should receive it here rather than separating it after slot_name.

HyeockJinKim · 2026-03-06T01:31:50Z

src/ai/backend/manager/api/gql/resource_slot/resolver.py

+
+
+@strawberry.field(
+    description="Added in 26.4.0. Returns a single resource slot type by slot_name, or null."


Please set the target version to 26.3.0 for all.

HyeockJinKim · 2026-03-06T01:39:26Z

src/ai/backend/manager/services/resource_slot/service.py

+    async def all_slot_types(self, action: AllSlotTypesAction) -> AllSlotTypesResult:
+        rows = await self._repository.all_slot_types()


I want to provide the search functionality that was originally offered, rather than the 'all' option. I don't want to provide the 'all' option.

HyeockJinKim · 2026-03-06T02:38:44Z

src/ai/backend/manager/api/gql/kernel/types.py

+    async def resource_allocations(
+        self,
+        info: Info[StrawberryGQLContext],
+    ) -> Annotated[
+        ResourceAllocationConnectionGQL,
+        strawberry.lazy("ai.backend.manager.api.gql.resource_slot.types"),
+    ]:
+        """Fetch per-slot resource allocation for this kernel."""
+        from ai.backend.manager.api.gql.resource_slot.fetcher import fetch_kernel_allocations
+
+        return await fetch_kernel_allocations(info=info, kernel_id=str(self.id))


Refer to the pattern where requests for connection passed all arguments such as filter, order, before, etc.

HyeockJinKim · 2026-03-06T04:41:55Z

src/ai/backend/manager/api/gql/agent/types.py

+    async def resource_slots(
+        self,
+        info: Info[StrawberryGQLContext],
+        first: int | None = None,
+        after: str | None = None,
+        last: int | None = None,
+        before: str | None = None,
+        limit: int | None = None,
+        offset: int | None = None,
+    ) -> Annotated[
+        AgentResourceConnectionGQL,
+        strawberry.lazy("ai.backend.manager.api.gql.resource_slot.types"),
+    ]:


The filter and order are missing.

HyeockJinKim · 2026-03-06T04:42:06Z

src/ai/backend/manager/api/gql/kernel/types.py

+    async def resource_allocations(
+        self,
+        info: Info[StrawberryGQLContext],
+        first: int | None = None,
+        after: str | None = None,
+        last: int | None = None,
+        before: str | None = None,
+        limit: int | None = None,
+        offset: int | None = None,
+    ) -> Annotated[


The filter and order are missing.

HyeockJinKim · 2026-03-06T04:42:31Z

src/ai/backend/manager/api/gql/resource_slot/fetcher.py

+async def fetch_agent_resources(
+    info: Info[StrawberryGQLContext],
+    agent_id: str,
+    before: str | None = None,
+    after: str | None = None,
+    first: int | None = None,
+    last: int | None = None,
+    limit: int | None = None,
+    offset: int | None = None,
+) -> AgentResourceConnectionGQL:


The filter and order are missing. Instead of receiving agent_id, we need to verify the existing structure that receives scope, filter, and order.

…d connections - Add ResourceSlotTypeGQL(Node) exposing all resource_slot_types columns (slot_name, slot_type, display_name, description, display_unit, display_icon, number_format, rank) with ResourceSlotTypeConnectionGQL - Add AgentResourceSlotGQL(Node) for per-slot capacity/usage on agents with AgentResourceConnectionGQL; wire as resource_slots field on AgentV2GQL - Add KernelResourceAllocationGQL(Node) for per-slot allocation on kernels with ResourceAllocationConnectionGQL; wire as resource_allocations field on KernelV2GQL - Add root queries resource_slot_type(slot_name) and resource_slot_types() - Shared fetcher functions reused across root queries and connection resolvers - Add AllSlotTypesAction/GetSlotTypeAction to ResourceSlotService and processors - Add NumberFormatData to data layer; add RESOURCE_SLOT_TYPE to EntityType enum Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…_data() Replace fetcher-returning-GQL-type pattern in resolve_nodes with data-returning helpers + cls.from_data() calls, following the established pattern in AgentV2GQL. This satisfies mypy's Iterable[Self | None] constraint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… None Fetcher functions now propagate the exception so GraphQL returns error info to the user. resolve_nodes still catches it to comply with the relay spec (Iterable[Self | None]). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>