MPAM: Pull Request: CPU-less feature, numa id as domain id, performance fix#328
Open
fyu1 wants to merge 10 commits intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
Open
MPAM: Pull Request: CPU-less feature, numa id as domain id, performance fix#328fyu1 wants to merge 10 commits intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
fyu1 wants to merge 10 commits intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
Conversation
resctrl expects the domain list to be sorted by id. Do that. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> [ morse: Pulled out of a larger patch ] Signed-off-by: James Morse <james.morse@arm.com> (forward ported from commit 2549a35ffbfd18d785bb35b39107de93d4bd3c7f https://git-master.nvidia.com/r/a/linux-stable) [fenghuay: Remove "FIX ME" in the subject to avoid confusion.] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
… domain setup The current MPAM driver only considers the first component associated with an online/offline CPU during domain creation and teardown. This is insufficient, as CPU-initiated traffic may traverse multiple MSCs before reaching the target, and each MSC must be programmed consistently for proper resource partitioning. Update the MPAM driver to include all components associated with a given CPU during domain setup/teardown to expose expected schemata to userspace for effective resource control. Change-Id: I1eb106495f4e2d4d50cd3d7f2c41800a314764c3 Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (forward ported from commit fe7dfd164dda542070ca533715c8ec53b1b08fe0 https://git-master.nvidia.com/r/a/linux-stable) [fenghuay: solve conflicts, change cpu parameter in mpam_resctrl_offline_domain_hdr(), change cpu parameter in mpam_resctrl_alloc_domain_cpu(), change dom->comp to dom->ctrl_comp] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…rors No need to destory MSC instance for the user/admin programming errors sicne it's not causing any functional issues. Change-Id: I7734c7d63e8f38d038ba202dcb1da8102183a2eb Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (cherry picked from commit abb499798dfe50a93a8e8b376af85e0cf614cb5f https://git-master.nvidia.com/r/a/linux-stable) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
MPAM supports minimum bandwidth partitioning. Add logic to handle MBW_MIN. The unimplemented bits in MPAMCFG_MBW_MIN are RAZ/WI, so masking is unnecessary. Apply the same logic to MPAMCFG_MBW_MAX and MPAMCFG_CMAX to simplify the code and match 'cat schemata' values to user programmed inputs. Change-Id: I5b1ce4be69a5d75e8814ebaad7acfe061add2e0b Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (cherry picked from commit 0e5902b38181666cd4a247eadd30c4e6cbcea1c0 https://git-master.nvidia.com/r/a/linux-stable) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Control and monitor groups have a CLOSID and/or RMID that is used to count the cache usage and memory bandwidth of tasks in this group. Not all of MPAMs counters can be exposed via resctrl, as each counter also needs a monitor to be allocated. It is unlikely there are enough monitors for every RMID to have a monitor permanently allocated. To allow counters to be read via perf, the RMID that a control or monitor group is using needs exposing to user-space. This can be passed back to perf as a parameter. MPAM's PMG values are not unique, the PARTID needs to be provided too. Perf allows a number of u64 arguments, which is not enough to encode a control/monitor group name. Similarly, there has been some interest in allowing cgroup to manage the tasks file for resctrl. Exposing a unique identifier for each control or monitor group will allow cgroups to point to a resctrl group that holds its configuration. Provide a file in each control or monitor group that returns a unique identifier. When passed back to the kernel, resctrl can decode this into a closid/rmid, or just identify the control or monitor group. The value is xor'd with a value picked at boot as obsfucation. This is to prevent user-space from relying on the layout of this field, or re-using values between boots of the system. This is to allow the kernel to change the layout of this field in the future. Change-Id: I5ce7fcbbfb90edc8a104ecc0fec2d7ec0b8583e4 Signed-off-by: James Morse <james.morse@arm.com> [sonthineni: Fix build warning messages for v6.13] Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (cherry picked from commit 808be5354fb01bcb62d2405631f61fc874d5747c https://git-master.nvidia.com/r/a/linux-stable) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
In a NUMA system, each node may include CPUs, memory, MPAM MSC instances, or any combination thereof. Some high-end servers may have NUMA nodes that include MPAM MSC but no CPUs. In such cases, associate all possible CPUs for those MSCs. Change-Id: Id3e26278b7ced9e7866f8ec6c77f99430e5dad60 Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (cherry picked from commit c92f60d532b4d281592c26f3a409998a568c4150 https://git-master.nvidia.com/r/a/linux-stable) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Fix sighting: DGX-15400 MPAM MBW monitoring events missing in 6.17 devel Memory bandwidth monitoring event mbm_local_bytes is missing due to bugs in MPAM driver: 1. type is not passed to arg in mpam_msmon_read(); 2. After llc occupancy event is handled, mpam_resctrl_pick_counters() returns without continuing to handle mbw local and total bytes events. Fix the issues to allow enable mbw local and total bytes events. Fixes: 2470378 ("NVIDIA: SAUCE: arm_mpam: Use long MBWU counters if supported") Fixes: 977c7eb ("NVIDIA: SAUCE: untested: arm_mpam: resctrl: pick classes for use as mbm counters") Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…r event assignment mode mbm_total_bytes event is in mon_MB_xx file now. Add class memory type check to allow the event in place. This enables NUMA NID support for mb_event counter assignment mode. Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…y bandwidth monitoring and control Fix multiple issues preventing MBM and MBA for CPU-less NUMA nodes. Add mutex_lock/_unlock(&domain_list_lock) for proper synchronization. Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Collaborator
Author
|
test plan: http://10.112.214.86/vera/tb500_mpam_tests |
nirmoy
reviewed
Feb 23, 2026
| min_hw_granule = ~max_hw_value; | ||
| if (mpam_has_feature(mpam_feat_mbw_max, cfg)) { | ||
| u16 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1; | ||
|
|
Collaborator
There was a problem hiding this comment.
This needs a comment on why 5% less than MAX_BW instead of == MAX_BW.
Collaborator
|
@fyu1 Also some commits have this link as cherry-picked https://git-master.nvidia.com/r/a/linux-stable but when I click to that link I get no Found. |
…oo small mbw_min
DGX-15561 MPAM: stream performance degradation on 6.17-devel and 6.19-rc
upstream v3
mbw_min allows minimal memory bandwidth. If mbw_min is set too small
during boot time, memory bandwidth could be low when memory contention.
In some cases, this value is 1, which means memory bandwidth can
be as low as 1% of total memory bandwidth. This degrades memory access
performance.
According to T241-MPAM-4 erratum:
In the T241 implementation of memory-bandwidth partitioning, in the
absence of contention for bandwidth, the minimum bandwidth setting
can affect the amount of achieved bandwidth. Specifically, the
achieved bandwidth in the absence of contention can settle to any
value between the values of MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX.
Also, if MPAMCFG_MBW_MIN is set zero (below 0.78125%), once a core
enters a throttled state, it will never leave that state.
The first issue is not a cocern if the MPAM software allows to
program MPAMCFG_MBW_MIN through the sysfs interface. This patch
ensures program MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0
is programmed.
In the scenario where the resctrl doesn't support the MBW_MIN
interface via sysfs, to achieve bandwidth closer to MW_MAX in the
absence of contention, software should configure a relatively narrow
gap between MBW_MIN and MBW_MAX. The recommendation is to use a 5%
gap to mitigate the problem.
The new workaround is changed to:
1. Set mbw_min to 95% of mbw_max so memory bandwidth will be used as
much as possible.
2. If for any reason, the calculation of 95% of mbw_max is smaller than
1, mbw_min falls back to 1 to avoid to enter the throttle state.
This is backported from MPAM series 2 v5 that is being reviewed on LKML:
https://lore.kernel.org/lkml/20260224175720.2663924-39-ben.horgan@arm.com/
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
b39b9c3 to
769cf7e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Please merge the following MPAM commits:
NVIDIA: SAUCE: arm_mpam: Fix missing mbm_local_bytes and mbm_total_bytes
DGX-15561 MPAM: stream performance degradation on 6.17-devel and 6.19-rc upstream v3. This issue is fixed by this commit:
NVIDIA: SAUCE: arm_mpam: Fix memory access performance issue due to too small mbw_min
CPU-less memory node enabling and NUMA node id as domain id. Commits:
NVIDIA: SAUCE: arm_mpam: Fix support for CPU-less NUMA nodes in memory...
NVIDIA: SAUCE: arm_mpam: Add memory type checks to support mbw monitor event assignment mode
NVIDIA: SAUCE: arm_mpam: Handle CPU-less numa nodes
NVIDIA: SAUCE: arm_mpam: Include all associated MSC components during domain setup
NVIDIA: SAUCE: arm_mpam: Sort the domain list by domain-id
MBW_MIN support. Commits:
NVIDIA: SAUCE: arm_mpam: Add support for MBW_MIN
misc fixes:
NVIDIA: SAUCE: fs/resctrl: Export the closid/rmid to user-space
NVIDIA: SAUCE: arm_mpam: Avoid MSC teardown for the SW programming errors