Just a suggestion on how to implement GPU kernel timing functions using Events:
cuda Event_t startTime, stopTime;
float timeInMs;
cudaEventCreate(&startTime);
cudaEventCreate(&stopTime);
And before / after every kernel launch:
cudaEventRecord(startTime);
launch_kernel<<<blablabla>>>(...);
cudaEventRecord(stopTime);
cudaEventSynchronize(stopTime);
cudaEventElapsedTime(&timeInMs, startTime, stopTime);
And of course equivalent hip functions.
Just a suggestion on how to implement GPU kernel timing functions using Events:
And before / after every kernel launch:
And of course equivalent hip functions.