[1/n] Add generalized event types and GPU Performance Monitoring counter event support#1212
[1/n] Add generalized event types and GPU Performance Monitoring counter event support#1212briancoutinho wants to merge 1 commit intopytorch:mainfrom
Conversation
e9e530b to
fabc0c7
Compare
fabc0c7 to
93105fe
Compare
|
Hi @briancoutinho, this update is absolutely fantastic! As an out-of-tree accelerator developer, I’m genuinely excited about these advancements. If possible, could you kindly share an approximate timeline for the upcoming splitting changes? Can't wait to begin integrating our accelerator profiler with these impressive updates. Additionally, may I ask if Kineto is still open to accepting contributions for third-party accelerator profiler as a plugin after these changes merged? Thank you! At the end, happy new year to you and the community (maybe a little bit late, hah)! |
|
cc @sraikund16 @malfet @valentinandrei could you help review this PR. This is the first split of my original longer PR but I also introduced generic events that came in the last PR discussion. |
|
@KarhouTam thanks for the feedback :)
It should not take too long for me to develop the changes like 1-2 weeks. There is one more somewhat larger PR that should enable the dynamic plugin interface.
Actually, that is what our proposal was #1121 @malfet / @valentinandrei could we actually meet on a VC to discuss this? It is totally reasonable if we need to address security and code issues in the PRs. But if dynamic loading itself is an issue then it would be good to know that before putting some effort on this. Also see: pytorch/pytorch#166205 (comment) |
|
@briancoutinho can you please coordinate with @eqy to get added to say Tue meeting series and we can discuss it there And giving you a write permissions to the repo is not a problem imo |
|
Hi @briancoutinho, thank you so much for introducing this significant work; it's incredibly valuable for external accelerators.
Could you mind to share those concerns here after discussing with the Meta maintainers later? As far as I know, the ONNX runtime framework has been using dynamic loading for years to support multiple accelerators, which really helps in integrating new accelerators |
|
Thanks @malfet and @eqy! Please also include @yisitu in the discussion. |
|
ping @eqy can you please include Situ and me in the sync above |
Overview
Originally, this was part 1 of splitting PR #1148. It supports a new kind of GPU Counter events that will be published to the timeline as a time series. In the process I realized we should add more generic event types for accelerators rather than being tied to CUDA specific naming. This has historically lead to each new accelerator adding it's own events which is maintenance burden.
ActivityTypeenum. There are now aliases in the enum class definition for older events likeCUDA_RUNTIME,MTIA_RUNTIMEto name a few. The dynamic plugin will leverage generic events so upstream source code changes will not be required for new accelerators.Details
Accelerator-Agnostic Event Types
The change reorganizes the
ActivityTypeenum class to introduce generic, accelerator-agnostic event types that work across all hardware backends (CUDA, MTIA, HPU, XPU, etc.). Device-specific types are now deprecated aliases pointing to their generic counterparts. There are few corner case exceptions likeMTIA_INSIGHT,CUDA_SYNC, see the header.New Generic Event Types
RUNTIMECUDA_RUNTIME,MTIA_RUNTIME,GLOW_RUNTIME,XPU_RUNTIME,PRIVATEUSE1_RUNTIME,HPU_OPDRIVERCUDA_DRIVER,PRIVATEUSE1_DRIVERCONCURRENT_KERNELMTIA_CCP_EVENTSGPU_PM_COUNTERGuidance for future use of ActivityTypes
Existing code using deprecated aliases will continue to work, but new code should use the generic types:
I have not changed the usage of these types in the code base yet. That can happen in a follow up change.
Notes
defaultActivityTypesArrayis now constexpr, enabling compile-time evaluation and eliminating runtime overhead.GPU PM Counter Events
This is a straightforward change to emit Chrome Trace counter events] for counters obtained from the GPU. The event can be leveraged by any accelerator backend.
The values of the counters are embedded as key/val pairs in the output json
Testing
The
ParseTest.ActivityTypesvalidates that aliases in the Config string are correctly converted to the underlying base type. Since the enum class uses aliases all existing code in kineto that uses the original activity types continues to compile and work as expected.The test above also checks default activity types are unchanged
https://github.com/pytorch/kineto/blob/main/libkineto/test/ConfigTest.cpp#L89-L108