Skip to content

Replace CUPTI callback with delayed init for NVIDIA GPUs#1250

Open
sanrise wants to merge 1 commit intopytorch:mainfrom
sanrise:export-D92570679
Open

Replace CUPTI callback with delayed init for NVIDIA GPUs#1250
sanrise wants to merge 1 commit intopytorch:mainfrom
sanrise:export-D92570679

Conversation

@sanrise
Copy link
Contributor

@sanrise sanrise commented Feb 7, 2026

Summary:
Previously, kineto registered a permanent CUPTI callback on CUDA context creation to initialize the profiler. This blocked external tools like wprof from subscribing to CUPTI since only one subscriber is allowed.

This change replaces the CUPTI callback mechanism with a delayed initialization approach using a condition_variable-based timer. The profiler now initializes 1 second after libkineto_init, freeing CUPTI for on-demand profiling tools. The new DelayedInitializer class supports immediate cancellation on shutdown, avoiding the blocking behavior of std::future destructors.

Differential Revision: D92570679

Summary:
Previously, kineto registered a permanent CUPTI callback on CUDA context creation to initialize the profiler. This blocked external tools like wprof from subscribing to CUPTI since only one subscriber is allowed.

This change replaces the CUPTI callback mechanism with a delayed initialization approach using a condition_variable-based timer. The profiler now initializes 1 second after libkineto_init, freeing CUPTI for on-demand profiling tools. The new DelayedInitializer class supports immediate cancellation on shutdown, avoiding the blocking behavior of std::future destructors.

Differential Revision: D92570679
@meta-cla meta-cla bot added the cla signed label Feb 7, 2026
@meta-codesync
Copy link

meta-codesync bot commented Feb 7, 2026

@sanrise has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92570679.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant