-
Notifications
You must be signed in to change notification settings - Fork 168
Description
Problem
In src/ai/backend/agent/agent.py, the container_ids collection is built as a list, which could allow duplicate container IDs to be processed multiple times during stat collection.
Affected Code
*File*: src/ai/backend/agent/agent.py
Line 1391 (collect_container_stat)
container_ids: list[ContainerId] = []
for kernel_obj in [\*self.kernel_registry.values()]:
if not kernel_obj.stats_enabled or kernel_obj.container_id is None:
continue
container_ids.append(ContainerId(kernel_obj.container_id))
await self.stat_ctx.collect_container_stat(container_ids)Line 1403 (collect_process_stat)
container_ids = []
for kernel_obj in [\*self.kernel_registry.values()]:
if not kernel_obj.stats_enabled or kernel_obj.container_id is None:
continue
container_ids.append(ContainerId(kernel_obj.container_id))
await self.stat_ctx.collect_per_container_process_stat(container_ids)Proposed Solution
Change from list to set to automatically deduplicate container IDs:
container_ids: set[ContainerId] = set()
for kernel_obj in [\*self.kernel_registry.values()]:
if not kernel_obj.stats_enabled or kernel_obj.container_id is None:
continue
container_ids.add(ContainerId(kernel_obj.container_id))Benefits
-
*Automatic deduplication*: Prevents duplicate stat collection if same container ID appears multiple times
-
*Performance*: Avoids redundant stat collection calls
-
*Semantic correctness*: We want unique container IDs, set expresses this intent better
-
*Safety*: Defensive programming against potential edge cases in kernel registry
Implementation
Update both locations:
-
Line 1391:
collect_container_stat() -
Line 1403:
collect_process_stat()
Verify that collect_container_stat() and collect_per_container_process_stat() accept both list and set as input (or update type hints if needed).
Related
- Parent Epic: BA-4880 (Sprint 26.3 requests)
JIRA Issue: BA-4910