Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DO NOT MERGE THIS BEFORE MERGING CUTE#15
Summary
This PR adds the AME PMU path for CUTE and wires it into the XSAI CSR/perf plumbing.
The same content applies to both layers:
XSAI/CUTE: addCUTEPMU, local probes, and AME perf sidebandXSAI: connect the sideband intoXSCore,XSTile, andNewCSRThe new PMU is meant to be software-facing and lightweight:
mcycle/minstretare exposed as fixed AME countersmhpmevent3..31are writable programmable event selectorsmhpmcounter3..31count the selected events with an independent AME 8-bit pathMatAcc == CUTECSR Map
AME counter CSRs
[cols="^2,^2,^2,^2,10",options="header"]
|===
| CSR | Address | Privilege | Access | Description
|
ame_scounteren|0x5E6| SRW | counteren gate | S-mode counter permission bits|
ame_hcounteren|0x6C6| HRW | counteren gate | H-mode counter permission bits|
ame_mcounteren|0x7E8| MRW | counteren gate | M-mode counter permission bits|
ame_mcountinhibit|0x7E9| MRW | inhibit bits | AME counter inhibit control|
ame_mhpmevent3..31|0xBC3..0xBDF| MRW | event cfg | 29 programmable AME event selectors|
ame_mcycle|0xBE0| MRW | fixed counter | AME cycle counter|
ame_minstret|0xBE2| MRW | fixed counter | AME retire counter|
ame_mhpmcounter3..31|0xBE3..0xBFF| MRW | programmable counters | AME programmable counters|
ame_cycle|0xCE0| URO | shadow | U-mode shadow ofame_mcycle|
ame_instret|0xCE2| URO | shadow | U-mode shadow ofame_minstret|
ame_hpmcounter3..31|0xCE3..0xCFF| URO | shadow | U-mode shadow ofame_mhpmcounter3..31|
ame_scountovf|0xDE0| SRO | overflow shadow | AME overflow vector shadow|===
Notes
mhpmevent3..31use the same RISC-V event-field layout as the existing XiangShan perf counter family.ame_cycle/ame_instret/ame_hpmcounter*are read-only shadows only.*hhigh-half counters.CSR Bit Layouts
ame_mcountinhibit/Counteren[cols="^2,^2,10",options="header"]
|===
| Bit(s) | Name | Meaning
|
0|CY| inhibit AME cycle counting|
2|IR| inhibit AME instruction-retire counting|
31:3|HPM3| inhibit programmable AME HPM counters 3..31|===
ame_mhpmevent3..31[cols="^2,^2,10",options="header"]
|===
| Bit(s) | Name | Meaning
|
63|OF| sticky overflow flag, driven by counter overflow|
62|MINH| M-mode inhibit|
61|SINH| S-mode inhibit|
60|UINH| U-mode inhibit|
59|VSINH| VS-mode inhibit|
58|VUINH| VU-mode inhibit|
54:50|OPTYPE2| event combination op for event group 2|
49:45|OPTYPE1| event combination op for event group 1|
44:40|OPTYPE0| event combination op for event group 0|
39:30|EVENT3| event id group 3|
29:20|EVENT2| event id group 2|
19:10|EVENT1| event id group 1|
9:0|EVENT0| event id group 0|===
ame_scountovf[cols="^2,^2,10",options="header"]
|===
| Bit(s) | Name | Meaning
|
31:3|OFVEC| overflow vector for AME HPM counters 3..31|
2:0| - | reserved / zero|===
Permission gating
AME counters use an independent counteren chain:
ame_mcounterengates machine/supervisor/hypervisor visibilityame_hcounterengates VS/VU visibilityame_scounterengates user-facing access below SUROshadow counters are still subject to the corresponding permission checksame_scountovffollows the same read-mask style as the existingscountovfEvent Table
CUTEPMUuses a single event pool of 20 entries, with entry0reserved asnoEvent.The remaining entries are wired from
TaskControllerandLocalMMU.[cols="^2,^2,^2,10",options="header"]
|===
| ID | Name | Source | Description
| 0 |
noEvent| - | reserved| 1 |
amu_load_a_done|TaskController| A-load completion| 2 |
amu_load_b_done|TaskController| B-load completion| 3 |
amu_load_c_done|TaskController| C-load completion| 4 |
amu_store_done|TaskController| store completion| 5 |
amu_comp_done|TaskController| compute completion| 6 |
amu_release_done|TaskController| release completion| 7 |
amu_mma_nonfp|TaskController| non-FP MMA completion| 8 |
amu_mma_fp16|TaskController| FP16 MMA completion| 9 |
amu_mma_bf16|TaskController| BF16 MMA completion| 10 |
amu_mma_tf32|TaskController| TF32 MMA completion| 11 |
amu_aml_active|TaskController| AML busy cycle| 12 |
amu_bml_active|TaskController| BML busy cycle| 13 |
amu_cml_load_active|TaskController| CML-load busy cycle| 14 |
amu_mte_active|TaskController| MTE busy cycle| 15 |
amu_cml_store_active|TaskController| CML-store busy cycle| 16 |
amu_mem_rd_req|LocalMMU| read request fire| 17 |
amu_mem_wr_req|LocalMMU| write request fire| 18 |
amu_mem_rd_bytes_req|LocalMMU| read request bytes| 19 |
amu_mem_wr_bytes_req|LocalMMU| write request bytes|===
Slot mapping
mhpmevent3..31are mapped one-per-slot to the selected event id.AmeCounterNumis29, so the programmable window covers exactly3..31.Behavior
ame_mcyclecounts AME-owned work cycles.ame_minstretcounts AME retire completion.OFand feedsame_scountovf.enableAmeis false, AME CSR logic and the AME sideband are not generated.mhpmeventcombination semantics remain the existing XiangShan 4-event composition scheme.Implementation Details
XSAI/CUTEBundles.scalaPerfEventAme(value: UInt(8.W))AmeCSRWriteBundle(addr: UInt(12.W), data: UInt(64.W))AmePerfFromCSRIO/AmePerfToCoreIO/CutePerfIOTaskControllerPerfProbeandLocalMMUPerfProbeCUTEParameters.scalaAmeCounterNum = 29outsideDataWidthByteas the byte unit used by memory byte countersTaskController.scalaLocalMMU.scalaPopCount(RequestMask)CUTETOP.scalaCUTEPMUfromCSR.csrW,taskProbe,mmuProbe, andtoCoreCUTEPMU.scalaPFEventAme,HPerfCounterAme,HPerfMonitorAmeXSAIXSTile.scalaXSCuteandXSCoreenableAme = HasMatrixExtension && (MatAccKey == MatAcc.CUTE)XSCuteTop.scalaXSCore.scalaCSR.scalaPerfCounterIOwith AME-domain countersNewCSR.scalaMachineLevel.scalaame_mcounteren,ame_mcountinhibit,ame_mhpmevent3..31,ame_mcycle,ame_minstret,ame_mhpmcounter3..31SupervisorLevel.scalaame_scounteren,ame_scountovfHypervisorLevel.scalaame_hcounterenUnprivileged.scalaame_cycle,ame_instret,ame_hpmcounter*CSRPermitModule.scalaCSRConst.scalaReview Focus
mhpmeventcombination and OF behaviorCUTEPMUevent coverage and the 19 currently implemented event idsXSAI/CUTEandXSAI