Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,18 @@ spec:
value: "true"
- name: DCGM_EXPORTER_LISTEN
value: ":9400"
# Soft Go heap ceiling below the hard limit; native DCGM (cgo)
# allocations sit on top and scale with GPU count.
- name: GOMEMLIMIT
value: "450MiB"
value: "768MiB"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask Claude about this number. Here is its reasoning:

The reasoning

GOMEMLIMIT should sit below the hard memory limit, because it only bounds the Go heap — the native DCGM (cgo) allocations and Go runtime
non-heap overhead sit on top of it and are not counted against it. So the gap between GOMEMLIMIT and the hard limit is the headroom left
for everything GOMEMLIMIT doesn't see.

- Hard limit: 1Gi = 1024 MiB
- GOMEMLIMIT: 768 MiB → leaves 256 MiB of headroom for native + non-heap.

resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 200m
memory: 512Mi
# 1Gi: 512Mi OOMKills on 8-GPU nodes (one pod reads every GPU).
memory: 1Gi
securityContext:
runAsNonRoot: false
runAsUser: 0
Expand Down
Loading