Skip to content

rocprofiler-sdk support#1249

Open
scotts wants to merge 1 commit intopytorch:mainfrom
scotts:export-D92225068
Open

rocprofiler-sdk support#1249
scotts wants to merge 1 commit intopytorch:mainfrom
scotts:export-D92225068

Conversation

@scotts
Copy link
Contributor

@scotts scotts commented Feb 6, 2026

Summary:
Migrate ROCm support from roctracer64-lazy to rocprofiler-sdk-lazy. The history of previous attempts:

  1. Original PR: Add rocprofiler-sdk support #1050. Eventually this PR could no longer rebase to main.
  2. Attempt 1 to land, copied from original PR: D82773951; Add rocprofiler-sdk support 2 #1128. Reverted because of remote execution failures.
  3. Attempt 2 to land: D86455336; Backout D85735557 #1168. Reverted because of a 10% performance regression.

The performance regression has been resolved. The code has changed in two major ways since the previous attempts:

  1. The activity profilers were refactored in refactor activity profiler #1219. This code needed to adapt.
  2. We encountered segfaults during process exit in tests that did not actually do any profiling. This is probably a bug in rocprofiler-sdk-lazy. To avoid it, we're using std::atexit() to force an early exit. Thanks to @aaronwlma for the suggestion!

Differential Revision: D92225068

@meta-cla meta-cla bot added the cla signed label Feb 6, 2026
@meta-codesync
Copy link

meta-codesync bot commented Feb 6, 2026

@scotts has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92225068.

scotts added a commit to scotts/kineto that referenced this pull request Feb 6, 2026
Summary:

Migrate ROCm support from `roctracer64-lazy` to `rocprofiler-sdk-lazy`. The history of previous attempts:

1. Original PR: pytorch#1050. Eventually this PR could no longer rebase to main.
2. Attempt 1 to land, copied from original PR: D82773951; pytorch#1128. Reverted because of remote execution failures.
3. Attempt 2 to land: D86455336; pytorch#1168. Reverted because of a 10% performance regression.

The performance regression has been resolved. The code has changed in two major ways since the previous attempts:

1. The activity profilers were refactored in pytorch#1219. This code needed to adapt.
2. We encountered segfaults during process exit in tests that did not actually do any profiling. This is probably a bug in `rocprofiler-sdk-lazy`. To avoid it, we're using `std::atexit()` to force an early exit. Thanks to aaronwlma/aaronwlma for the suggestion!

Differential Revision: D92225068
scotts added a commit to scotts/kineto that referenced this pull request Feb 7, 2026
Summary:

Migrate ROCm support from `roctracer64-lazy` to `rocprofiler-sdk-lazy`. The history of previous attempts:

1. Original PR: pytorch#1050. Eventually this PR could no longer rebase to main.
2. Attempt 1 to land, copied from original PR: D82773951; pytorch#1128. Reverted because of remote execution failures.
3. Attempt 2 to land: D86455336; pytorch#1168. Reverted because of a 10% performance regression.

The performance regression has been resolved. The code has changed in two major ways since the previous attempts:

1. The activity profilers were refactored in pytorch#1219. This code needed to adapt.
2. We encountered segfaults during process exit in tests that did not actually do any profiling. This is probably a bug in `rocprofiler-sdk-lazy`. To avoid it, we're using `std::atexit()` to force an early exit. Thanks to aaronwlma/aaronwlma for the suggestion!

Differential Revision: D92225068
Summary:

Migrate ROCm support from `roctracer64-lazy` to `rocprofiler-sdk-lazy`. The history of previous attempts:

1. Original PR: pytorch#1050. Eventually this PR could no longer rebase to main.
2. Attempt 1 to land, copied from original PR: D82773951; pytorch#1128. Reverted because of remote execution failures.
3. Attempt 2 to land: D86455336; pytorch#1168. Reverted because of a 10% performance regression.

The performance regression has been resolved. The code has changed in two major ways since the previous attempts:

1. The activity profilers were refactored in pytorch#1219. This code needed to adapt.
2. We encountered segfaults during process exit in tests that did not actually do any profiling. This is probably a bug in `rocprofiler-sdk-lazy`. To avoid it, we're using `std::atexit()` to force an early exit. Thanks to aaronwlma/aaronwlma for the suggestion!

Differential Revision: D92225068
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants