MPAM: Pull Request: CPU-less feature, numa id as domain id, performance fix by fyu1 · Pull Request #328 · NVIDIA/NV-Kernels

fyu1 · 2026-02-23T19:52:12Z

Please merge the following MPAM commits:

DGX-15400 MPAM MBW monitoring events missing in 6.17 devel This issue is fixed by the following commits:

NVIDIA: SAUCE: arm_mpam: Fix missing mbm_local_bytes and mbm_total_bytes

DGX-15561 MPAM: stream performance degradation on 6.17-devel and 6.19-rc upstream v3. This issue is fixed by this commit:
NVIDIA: SAUCE: arm_mpam: Fix memory access performance issue due to too small mbw_min
CPU-less memory node enabling and NUMA node id as domain id. Commits:

NVIDIA: SAUCE: arm_mpam: Fix support for CPU-less NUMA nodes in memory...
NVIDIA: SAUCE: arm_mpam: Add memory type checks to support mbw monitor event assignment mode
NVIDIA: SAUCE: arm_mpam: Handle CPU-less numa nodes
NVIDIA: SAUCE: arm_mpam: Include all associated MSC components during domain setup
NVIDIA: SAUCE: arm_mpam: Sort the domain list by domain-id

MBW_MIN support. Commits:
NVIDIA: SAUCE: arm_mpam: Add support for MBW_MIN
misc fixes:

NVIDIA: SAUCE: fs/resctrl: Export the closid/rmid to user-space
NVIDIA: SAUCE: arm_mpam: Avoid MSC teardown for the SW programming errors

resctrl expects the domain list to be sorted by id. Do that. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> [ morse: Pulled out of a larger patch ] Signed-off-by: James Morse <james.morse@arm.com> (forward ported from commit 2549a35ffbfd18d785bb35b39107de93d4bd3c7f https://git-master.nvidia.com/r/a/linux-stable) [fenghuay: Remove "FIX ME" in the subject to avoid confusion.] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

… domain setup The current MPAM driver only considers the first component associated with an online/offline CPU during domain creation and teardown. This is insufficient, as CPU-initiated traffic may traverse multiple MSCs before reaching the target, and each MSC must be programmed consistently for proper resource partitioning. Update the MPAM driver to include all components associated with a given CPU during domain setup/teardown to expose expected schemata to userspace for effective resource control. Change-Id: I1eb106495f4e2d4d50cd3d7f2c41800a314764c3 Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (forward ported from commit fe7dfd164dda542070ca533715c8ec53b1b08fe0 https://git-master.nvidia.com/r/a/linux-stable) [fenghuay: solve conflicts, change cpu parameter in mpam_resctrl_offline_domain_hdr(), change cpu parameter in mpam_resctrl_alloc_domain_cpu(), change dom->comp to dom->ctrl_comp] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…rors No need to destory MSC instance for the user/admin programming errors sicne it's not causing any functional issues. Change-Id: I7734c7d63e8f38d038ba202dcb1da8102183a2eb Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (cherry picked from commit abb499798dfe50a93a8e8b376af85e0cf614cb5f https://git-master.nvidia.com/r/a/linux-stable) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

MPAM supports minimum bandwidth partitioning. Add logic to handle MBW_MIN. The unimplemented bits in MPAMCFG_MBW_MIN are RAZ/WI, so masking is unnecessary. Apply the same logic to MPAMCFG_MBW_MAX and MPAMCFG_CMAX to simplify the code and match 'cat schemata' values to user programmed inputs. Change-Id: I5b1ce4be69a5d75e8814ebaad7acfe061add2e0b Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (cherry picked from commit 0e5902b38181666cd4a247eadd30c4e6cbcea1c0 https://git-master.nvidia.com/r/a/linux-stable) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

Control and monitor groups have a CLOSID and/or RMID that is used to count the cache usage and memory bandwidth of tasks in this group. Not all of MPAMs counters can be exposed via resctrl, as each counter also needs a monitor to be allocated. It is unlikely there are enough monitors for every RMID to have a monitor permanently allocated. To allow counters to be read via perf, the RMID that a control or monitor group is using needs exposing to user-space. This can be passed back to perf as a parameter. MPAM's PMG values are not unique, the PARTID needs to be provided too. Perf allows a number of u64 arguments, which is not enough to encode a control/monitor group name. Similarly, there has been some interest in allowing cgroup to manage the tasks file for resctrl. Exposing a unique identifier for each control or monitor group will allow cgroups to point to a resctrl group that holds its configuration. Provide a file in each control or monitor group that returns a unique identifier. When passed back to the kernel, resctrl can decode this into a closid/rmid, or just identify the control or monitor group. The value is xor'd with a value picked at boot as obsfucation. This is to prevent user-space from relying on the layout of this field, or re-using values between boots of the system. This is to allow the kernel to change the layout of this field in the future. Change-Id: I5ce7fcbbfb90edc8a104ecc0fec2d7ec0b8583e4 Signed-off-by: James Morse <james.morse@arm.com> [sonthineni: Fix build warning messages for v6.13] Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (cherry picked from commit 808be5354fb01bcb62d2405631f61fc874d5747c https://git-master.nvidia.com/r/a/linux-stable) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

In a NUMA system, each node may include CPUs, memory, MPAM MSC instances, or any combination thereof. Some high-end servers may have NUMA nodes that include MPAM MSC but no CPUs. In such cases, associate all possible CPUs for those MSCs. Change-Id: Id3e26278b7ced9e7866f8ec6c77f99430e5dad60 Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (cherry picked from commit c92f60d532b4d281592c26f3a409998a568c4150 https://git-master.nvidia.com/r/a/linux-stable) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

Fix sighting: DGX-15400 MPAM MBW monitoring events missing in 6.17 devel Memory bandwidth monitoring event mbm_local_bytes is missing due to bugs in MPAM driver: 1. type is not passed to arg in mpam_msmon_read(); 2. After llc occupancy event is handled, mpam_resctrl_pick_counters() returns without continuing to handle mbw local and total bytes events. Fix the issues to allow enable mbw local and total bytes events. Fixes: 2470378 ("NVIDIA: SAUCE: arm_mpam: Use long MBWU counters if supported") Fixes: 977c7eb ("NVIDIA: SAUCE: untested: arm_mpam: resctrl: pick classes for use as mbm counters") Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…r event assignment mode mbm_total_bytes event is in mon_MB_xx file now. Add class memory type check to allow the event in place. This enables NUMA NID support for mb_event counter assignment mode. Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…y bandwidth monitoring and control Fix multiple issues preventing MBM and MBA for CPU-less NUMA nodes. Add mutex_lock/_unlock(&domain_list_lock) for proper synchronization. Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

fyu1 · 2026-02-23T20:12:16Z

test plan: http://10.112.214.86/vera/tb500_mpam_tests

nirmoy · 2026-02-23T20:28:11Z

drivers/resctrl/mpam_devices.c

-	min_hw_granule = ~max_hw_value;
+	if (mpam_has_feature(mpam_feat_mbw_max, cfg)) {
+		u16 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
+


This needs a comment on why 5% less than MAX_BW instead of == MAX_BW.

clsotog · 2026-02-24T18:59:05Z

@fyu1
This commit b39b9c3
the sign-off does not look correct like Jamie pointed in another MR.

Also some commits have this link as cherry-picked https://git-master.nvidia.com/r/a/linux-stable but when I click to that link I get no Found.

…oo small mbw_min DGX-15561 MPAM: stream performance degradation on 6.17-devel and 6.19-rc upstream v3 mbw_min allows minimal memory bandwidth. If mbw_min is set too small during boot time, memory bandwidth could be low when memory contention. In some cases, this value is 1, which means memory bandwidth can be as low as 1% of total memory bandwidth. This degrades memory access performance. According to T241-MPAM-4 erratum: In the T241 implementation of memory-bandwidth partitioning, in the absence of contention for bandwidth, the minimum bandwidth setting can affect the amount of achieved bandwidth. Specifically, the achieved bandwidth in the absence of contention can settle to any value between the values of MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set zero (below 0.78125%), once a core enters a throttled state, it will never leave that state. The first issue is not a cocern if the MPAM software allows to program MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed. In the scenario where the resctrl doesn't support the MBW_MIN interface via sysfs, to achieve bandwidth closer to MW_MAX in the absence of contention, software should configure a relatively narrow gap between MBW_MIN and MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem. The new workaround is changed to: 1. Set mbw_min to 95% of mbw_max so memory bandwidth will be used as much as possible. 2. If for any reason, the calculation of 95% of mbw_max is smaller than 1, mbw_min falls back to 1 to avoid to enter the throttle state. This is backported from MPAM series 2 v5 that is being reviewed on LKML: https://lore.kernel.org/lkml/20260224175720.2663924-39-ben.horgan@arm.com/ Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

shankerd04 and others added 9 commits February 23, 2026 18:34

fyu1 assigned nirmoy and nvmochs Feb 23, 2026

fyu1 requested review from clsotog and jamieNguyenNVIDIA and removed request for clsotog February 23, 2026 19:54

nirmoy reviewed Feb 23, 2026

View reviewed changes

fyu1 force-pushed the 24.04_linux-nvidia-6.17-next branch from b39b9c3 to 769cf7e Compare February 24, 2026 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPAM: Pull Request: CPU-less feature, numa id as domain id, performance fix#328

MPAM: Pull Request: CPU-less feature, numa id as domain id, performance fix#328
fyu1 wants to merge 10 commits intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
fyu1:24.04_linux-nvidia-6.17-next

fyu1 commented Feb 23, 2026

Uh oh!

fyu1 commented Feb 23, 2026

Uh oh!

nirmoy Feb 23, 2026

Uh oh!

clsotog commented Feb 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

fyu1 commented Feb 23, 2026

Uh oh!

fyu1 commented Feb 23, 2026

Uh oh!

nirmoy Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

clsotog commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

clsotog commented Feb 24, 2026 •

edited

Loading