Skip to content

Introduce XPU scope profiler extending existing XPU profiler plugin#1174

Open
moksiuc wants to merge 34 commits intopytorch:mainfrom
moksiuc:moksiuci_6674_scope_profiler
Open

Introduce XPU scope profiler extending existing XPU profiler plugin#1174
moksiuc wants to merge 34 commits intopytorch:mainfrom
moksiuc:moksiuci_6674_scope_profiler

Conversation

@moksiuc
Copy link
Contributor

@moksiuc moksiuc commented Nov 13, 2025

Summary:

As XPU became a PyTorch built-in device, the profiler support is indispensable part of functionality completeness. In this PR, the XPU scope profiler is introduced by extending existing XPU profiler plugin. The XPU scope profiler is built on the foundation of intel PTI toolkit (https://github.com/intel/pti-gpu), and underlying SYCL runtime. It allows to gather XPU hardware metrics. The LIBKINETO_NOXPUPTI option is used to enable or disable the whole XPU profiler plugin during kineto build stage.

Changes:

  • Added new ActivityType : XPU_SCOPE_PROFILER, enabling the new scope profiler
  • Enhanced ChromeTraceLogger::handleActivity method so it outputs XPU hardware metrics from the new scope profiler in Perfetto counters display mode ("C")
  • Added gtest

@meta-cla meta-cla bot added the cla signed label Nov 13, 2025
@moksiuc moksiuc changed the title scope profiler squashed Introduce XPU scope profiler extending existing XPU profiler plugin Nov 13, 2025
@moksiuc
Copy link
Contributor Author

moksiuc commented Nov 13, 2025

@EikanWang, @gujinghui

@gujinghui
Copy link

@moksiuc It's great that we are going to update our PTI integration code, and introduce new profiler path.
Could you help address below questions?

  1. The alternative of ScopeProfiler is the RangeProfiler for CUDA? Looks like the RangeProfler is not enabled in PyTorch by default so far. Do you know why?
  2. This PR is too huge to review. Can we split it to several PRs? For example, one PR for code refactor or cleanup per kineto or PTI changes, one or two PRs for ScopeProfiler, one PR for ChromeTraceLogger enhancement, and add test cases for each PRs.
  3. BTW, CUDA provides CUDA_DRIVER activity to trace the driver actions. We should provide L0 actions as the counterpart, right? I remember, PTI should be able to do that. Do we have plan to cover it?
    {"cuda_driver", ActivityType::CUDA_DRIVER},

@moksiuc
Copy link
Contributor Author

moksiuc commented Nov 14, 2025

  1. The alternative of ScopeProfiler is the RangeProfiler for CUDA? Looks like the RangeProfler is not enabled in PyTorch by default so far. Do you know why?
    It is enabled by providing experimental_config=_ExperimentalConfig(...). I don't know why it is this way but we are enabling our profiler the same way. One of the reasons may be that Range/Scope profiler requires parameters like HW metrics names that are passed through _ExperimentalConfig.

@moksiuc
Copy link
Contributor Author

moksiuc commented Nov 14, 2025

  1. BTW, CUDA provides CUDA_DRIVER activity to trace the driver actions. We should provide L0 actions as the counterpart, right? I remember, PTI should be able to do that. Do we have plan to cover it?
    {"cuda_driver", ActivityType::CUDA_DRIVER},

For sure not in this PR. I'll add this to our list of tasks.

@moksiuc
Copy link
Contributor Author

moksiuc commented Nov 17, 2025

  1. This PR is too huge to review. Can we split it to several PRs? For example, one PR for code refactor or cleanup per kineto or PTI changes, one or two PRs for ScopeProfiler, one PR for ChromeTraceLogger enhancement, and add test cases for each PRs.

Extracted clean up and adding config for scope profiler to separate PR's.
This one should be much smaller afterwards.
Currently I don't see further areas of extracting separate PR's as what would remain is full scope profiler implementation with tests and we'd like not to introduce half of the implementation that is not working functionally.

- removed rangeEnabled
- fix test to align to this removal
- erase used kernelActivity from map
- place of config initialization
- removal of passing unused C compiler flag into test cmake file
Copy link

@gujinghui gujinghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is split to #1177, #1180, and more.

@moksiuc moksiuc marked this pull request as ready for review November 24, 2025 10:01
@gujinghui
Copy link

@moksiuc Let's close this PR.

@moksiuc
Copy link
Contributor Author

moksiuc commented Dec 3, 2025

@gujinghui this is the core of the scope profiler. When 2 smaller parts are merged this one would have only core profiler left.

@moksiuc
Copy link
Contributor Author

moksiuc commented Dec 22, 2025

LGTM. I assume you already verified it on local real machine, right? @moksiuc

Yes, I did.

@gujinghui
Copy link

@malfet @sraikund16 could you help review this PR? Thanks.

@moksiuc
Copy link
Contributor Author

moksiuc commented Jan 12, 2026

@malfet @sraikund16 could you review, please ?

Copy link
Contributor

@divyanshk divyanshk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I'm ramping up on this space. I'm curious to understand how much does it differ from the existing XPU profiler - maybe it should exist as a separate class in itself ? How can I learn more about the scope profiler ?

Also, thank you for adding tests.

@moksiuc
Copy link
Contributor Author

moksiuc commented Jan 13, 2026

Thanks for the PR. I'm ramping up on this space. I'm curious to understand how much does it differ from the existing XPU profiler - maybe it should exist as a separate class in itself ? How can I learn more about the scope profiler ?

Also, thank you for adding tests.

This PR adds new activity ActivityType::XPU_SCOPE_PROFILER, accepted by XPU profiler. With this new mode enabled XPU profiler additionally gathers requested HW metrics.
I don't think the separate class is necessary as it just adds additional mode to a few existing so far.
The new mode is a XPU alternative for what is implemented in class CuptiRangeProfiler.

@moksiuc moksiuc requested a review from divyanshk January 13, 2026 09:04
@gujinghui
Copy link

gujinghui commented Jan 14, 2026

Thanks for the PR. I'm ramping up on this space. I'm curious to understand how much does it differ from the existing XPU profiler - maybe it should exist as a separate class in itself ? How can I learn more about the scope profiler ?
Also, thank you for adding tests.

This PR adds new activity ActivityType::XPU_SCOPE_PROFILER, accepted by XPU profiler. With this new mode enabled XPU profiler additionally gathers requested HW metrics. I don't think the separate class is necessary as it just adds additional mode to a few existing so far. The new mode is a XPU alternative for what is implemented in class CuptiRangeProfiler.

Hi @divyanshk ,

For more details of CuptiRangeProfiler, please see this page. https://deepwiki.com/pytorch/kineto/3.1-nvidia-gpu-support-(cupti)#4-cupti-range-profiler-integration

@moksiuc is going to provide similar functionalities on XPU in this PR.

@scotts
Copy link
Contributor

scotts commented Jan 15, 2026

Hi, @moksiuc, I'm also a new maintainer to this library. Thanks for all of the recent cleanup PRs!

On this PR, I have a similar concern that @divyanshk does on the code structuring. Specifically, this code uses #ifdefs to conditionally add member functions, member functions and member structs. This library has a history of doing that, but we want to start untangling that as it's difficult to maintain.

We should be able to achieve the same functionality by defining XpuptiActivityScopeApi (or a similar name) that derives from XpuptiActivityApi. Then we can use the #ifdefs to guard the entire files, which is much easier to reason about. You may need to still modify the XpuptiActivityApi interface itself a little so that it all works cleanly with XpuActivityProfiler, but that seems reasonable.

@ZhaoqiongZ ZhaoqiongZ moved this to Aged Pending Review in PyTorch Intel Jan 16, 2026
@ZhaoqiongZ ZhaoqiongZ moved this from Aged Pending Review to In Progress in PyTorch Intel Jan 16, 2026
@moksiuc
Copy link
Contributor Author

moksiuc commented Jan 20, 2026

@divyanshk , @scotts I've split files and moved the new code to new files. Please review.

Copy link
Contributor

@divyanshk divyanshk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moksiuc @gujinghui Can you help me understand a few things, thanks
1.) Do we have a different intel PTI API that is being leveraged for the range profiler ? How stable and well-supported is that API ? I ask because CUPTIRangeProfiler uses a different CUPTI API (CUPTI Profiling API) from the canonical profiler. Also that API is marked for deprecation so we are thinking of deprecating and deleting the CUPTIRangeProfiler.
2.) Can you point me to the official documentation for scope profiler API for Intel?
3.) Deferring code restructuring concerns to @scotts, I am starting to think ActivityType might be over-used here; because activitytype should be signifying profiling activities/events but CUDA_PROFILER_RANGE and XPU_SCOPE_PROFILER are independent profilers. Thoughts?

@moksiuc
Copy link
Contributor Author

moksiuc commented Feb 2, 2026

@divyanshk,

  1. The PTI API used for XPU Scope Profiler has been introduced in version 0.15 of https://github.com/intel/pti-gpu. According to their API policy it would always stay backwards compatible so it would not change in a way breaking the compatibility.
  2. The documentation is not yet ready as the feature is not yet available - it is only in this PR. It would be prepared as soon as the feature is available.
  3. Maybe CUDA_PROFILER_RANGE works differently but XPU_SCOPE_PROFILER is not independent profiler but additional mode of current profiler. It works independently on other profiling modes and gathers additional HW metrics.

@moksiuc moksiuc requested a review from divyanshk February 4, 2026 08:56
@moksiuc
Copy link
Contributor Author

moksiuc commented Feb 4, 2026

@scotts, could you review restructured code, please ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

6 participants