[WIP] Ncu profile by ngc92 · Pull Request #368 · gpu-mode/kernelbot

ngc92 · 2025-11-01T18:53:55Z

No description provided.

github-actions · 2025-11-01T18:54:49Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
src/libkernelbot
report.py					73, 350
Project Total

_{This report was generated by python-coverage-comment-action}

small code improvements

Copilot

Pull Request Overview

This is a work-in-progress PR that adds NVIDIA Nsight Compute (NCU) profiling support to complement the existing ROCm profiling capabilities. The changes refactor the profiling infrastructure to support multiple profilers and enable per-benchmark profiling runs.

Adds NCU profiling implementation with filtered report generation
Refactors profiling to run each benchmark separately and embed trace data in results
Updates test infrastructure and workflows for new GPU runners

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
src/libkernelbot/run_eval.py	Adds NCU profiling function, refactors profile_program to use temporary directories, changes parameter order to use keyword-only arguments, and modifies run_evaluation to handle per-benchmark profiling
src/libkernelbot/report.py	Adds File dataclass for attaching profile files to reports, updates report generation to handle multiple profile runs, and adds _shortname helper for filename generation
tests/test_report.py	Updates tests to reflect new profile reporting structure with trace field and File attachments
examples/eval.py	Adds _run_single_profile_ncu function for NCU-specific profiling and updates profile selection logic based on POPCORN_NCU environment variable
src/kernelbot/discord_utils.py	Adds _send_file helper function and updates _send_split_log to add newlines properly and send messages silently
src/kernelbot/discord_reporter.py	Adds File handling to display_report for uploading profile attachments
src/kernelbot/api/api_utils.py	Adds check to block profile submissions via API and reformats code
src/runners/modal_runner.py	Updates Python version from 3.12 to 3.13 in CUDA image
.github/workflows/nvidia_workflow.yml	Updates runner to nvidia-docker-b200-8-x86-64, removes container configuration, and simplifies setup steps
.github/workflows/nvidia-arc-health.yml	Updates runner and removes Python/PyTorch installation steps
src/libkernelbot/launchers/github.py	Adds conditional check for profile-data artifact before setting download_url and reduces polling interval
scripts/ci_test_python.py	Updates test to use python3 command and keyword argument for system parameter
scripts/ci_test_cuda.py	Updates tests to use keyword argument for system parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-10T19:29:51Z

+            if _handle_crash_report(report, prof_run):
+                return report
+
+            if prof_run.profile.trace is not None:


Potential AttributeError if prof_run.profile is None. The code checks prof_run.profile.trace is not None without first verifying that prof_run.profile itself is not None. Consider adding a null check:

if prof_run.profile is not None and prof_run.profile.trace is not None:

Suggested change

if prof_run.profile.trace is not None:

if prof_run.profile is not None and prof_run.profile.trace is not None:

Copilot · 2025-11-10T19:29:51Z

+    if submission_mode_enum == SubmissionMode.PROFILE:
+        raise HTTPException(
+            status_code=400,
+            detail="Profile submissions are not currently supported via API, use Discord instead.",
+        )


Duplicate check for SubmissionMode.PROFILE on lines 214 and 225. The first check already raises an exception if the mode is PROFILE, so the second check on line 225 is unreachable. Consider removing the duplicate check on lines 225-229.

Suggested change

if submission_mode_enum == SubmissionMode.PROFILE:

raise HTTPException(

status_code=400,

detail="Profile submissions are not currently supported via API, use Discord instead.",

)

Copilot · 2025-11-10T19:29:52Z

    - name: Create input files
      shell: bash


The "Create input files" step runs apt-get commands without sudo and without a container context. The previous version used a container (image: nvidia/cuda:12.4.0-devel-ubuntu22.04), which provided a root environment. Without a container, these commands will likely fail on the self-hosted runner unless the runner is configured to run as root (which is a security risk). Consider either restoring the container setup or using sudo for the apt-get commands, or ensure jq is pre-installed on the runner.

Copilot · 2025-11-10T19:29:52Z

    def log_one(base_name):
-        spec = run.result.get(f"{base_name}.spec")
-
        report: str = run.result.get(f"{base_name}.report")


Potential error if report is None. The code calls .encode() on report without checking if it exists first. If benchmark.{i}.report is not in the result dictionary, this will raise an AttributeError. Consider adding a check:

report: str = run.result.get(f"{base_name}.report") if report is None: return report = base64.b64decode(report.encode("utf-8"), b"+*").decode("utf-8")

Suggested change

report: str = run.result.get(f"{base_name}.report")

report: str = run.result.get(f"{base_name}.report")

if report is None:

return

Copilot · 2025-11-10T19:29:52Z

+    """
+    Profiles a single benchmark using ncu. Note: this does not
+    invoke NCU; instead, it is expected that eval is launched
+    under NCU, and this function will rurnthe kernel excactly


Typo in the docstring: "rurnthe" should be "run the".

Suggested change

under NCU, and this function will rurnthe kernel excactly

under NCU, and this function will run the kernel excactly

Copilot · 2025-11-10T19:29:53Z

+                    report.add_file(
+                        f"profile-{_shortname(prof_run.run.result.get('benchmark.0.spec'))}.zip",
+                        f"{prof_run.profile.profiler} report - " + prof_run.run.result.get("benchmark.0.spec"),


Potential error if benchmark.0.spec is not in the result dictionary. The get() method will return None if the key doesn't exist, which will cause _shortname() to fail with an AttributeError when trying to call .replace() on None. Consider adding a default value or null check:

spec = prof_run.run.result.get('benchmark.0.spec', 'unknown') report.add_file( f"profile-{_shortname(spec)}.zip", f"{prof_run.profile.profiler} report - {spec}", base64.b64decode(prof_run.profile.trace), )

Suggested change

report.add_file(

f"profile-{_shortname(prof_run.run.result.get('benchmark.0.spec'))}.zip",

f"{prof_run.profile.profiler} report - " + prof_run.run.result.get("benchmark.0.spec"),

spec = prof_run.run.result.get('benchmark.0.spec', 'unknown')

report.add_file(

f"profile-{_shortname(spec)}.zip",

f"{prof_run.profile.profiler} report - {spec}",

Copilot · 2025-11-10T19:29:54Z

+
+            if prof_run.profile.trace is not None:
+                report.add_log(
+                    f"Profiling {prof_run.run.result.get('benchmark.0.spec')}",


Potential error if benchmark.0.spec is not in the result dictionary. The get() method will return None if the key doesn't exist, which will be interpolated as the string "None" in the log header. Consider adding a default value:

f"Profiling {prof_run.run.result.get('benchmark.0.spec', 'unknown benchmark')}"

Suggested change

f"Profiling {prof_run.run.result.get('benchmark.0.spec')}",

f"Profiling {prof_run.run.result.get('benchmark.0.spec', 'unknown benchmark')}",

Copilot · 2025-11-10T19:29:54Z

    with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
        with nvtx_range("custom_kernel"):
-            submission_output = custom_kernel(_clone_data(data, 0))
+            submission_output = custom_kernel(cloned)


Variable submission_output is not used.

Copilot · 2025-11-10T19:29:54Z

+
+    cloned = _clone_data(data, 0)
+    with nvtx_range("custom_kernel"):
+        submission_output = custom_kernel(cloned)


Variable submission_output is not used.

Suggested change

submission_output = custom_kernel(cloned)

custom_kernel(cloned)

Copilot · 2025-11-10T19:29:55Z

+    ] + call
+
+    run_result = run_program(
+        call, seed=seed, timeout=timeout, multi_gpu=multi_gpu, extra_env={"POPCORN_NCU": "1"}


'except' clause does nothing but pass and there is no explanatory comment.

* Change runner from gpumode-nvidia-arc to Nvidia-A100 * Update nvidia-arc-health.yml * Update nvidia-arc-health.yml * Feat: run health on b200 * tmp * tmp * tmp * feat * feat * feat * replace nvidia workflow to point to our b200 cluster * Fix: container * Fix: python->python3 * Fix: add back deps * Fix: python->python3 * Fix: python->python3 * Add nvidia-smi * split profiling into rocm/ncu; small code improvements * profile each benchmark individually for cleaner traces * profile in tempdir * send profile results as attached files * don't spam alerts * include default ncu report * attempt at filtered ncu * formatting fix * fix tests * Fix: good error for profile via api * Fix: remove nvidia-smi from workflow * Fix: polling time to 15s * limit profiling report length * limit number of kernels to be profiled * stricter matching for kernel name lines * add an additional safety limit to ncu reports * fix * Fix: style --------- Co-authored-by: Mark Saroufim <marksaroufim@meta.com> Co-authored-by: S1ro1 <matej.sirovatka@gmail.com> Co-authored-by: Alex Zhang <alex.lx.zhang@gmail.com>

Mark Saroufim and others added 17 commits October 13, 2025 16:32

Change runner from gpumode-nvidia-arc to Nvidia-A100

3bf6f64

Update nvidia-arc-health.yml

5f40e36

Update nvidia-arc-health.yml

e3ac730

Feat: run health on b200

c60090b

tmp

2a69a10

tmp

9a6c08d

tmp

aa2f894

feat

fbc28ad

feat

6437e19

feat

6c4bde0

replace nvidia workflow to point to our b200 cluster

a3e045c

Fix: container

844d3bf

Fix: python->python3

3275924

Fix: add back deps

b19b59b

Fix: python->python3

3e8eb6f

Fix: python->python3

998cf42

Add nvidia-smi

1de31fd

ngc92 force-pushed the ncu-profile branch 3 times, most recently from 248c411 to ce0c6ec Compare November 1, 2025 19:34

split profiling into rocm/ncu;

d754094

small code improvements

ngc92 force-pushed the ncu-profile branch from ce0c6ec to 954241f Compare November 9, 2025 12:44

profile each benchmark individually for cleaner traces

394e234

ngc92 force-pushed the ncu-profile branch 2 times, most recently from ae72216 to 4928fc2 Compare November 9, 2025 13:39

profile in tempdir

0e51cf5

ngc92 force-pushed the ncu-profile branch 3 times, most recently from de51bbf to 779ee7f Compare November 9, 2025 14:22

send profile results as attached files

3e6a59c

ngc92 force-pushed the ncu-profile branch from 779ee7f to 3e6a59c Compare November 9, 2025 14:26

don't spam alerts

f31e4bb

ngc92 force-pushed the ncu-profile branch from 4540408 to a73eddc Compare November 9, 2025 14:42

include default ncu report

00c215a

ngc92 force-pushed the ncu-profile branch from a73eddc to 00c215a Compare November 9, 2025 14:45

attempt at filtered ncu

b014b79

ngc92 force-pushed the ncu-profile branch from 0ddd352 to b014b79 Compare November 9, 2025 15:19

formatting fix

f328eba

ngc92 force-pushed the ncu-profile branch 4 times, most recently from 16b4d0c to d8d7145 Compare November 9, 2025 16:22

fix tests

eaa54f7

ngc92 force-pushed the ncu-profile branch from d8d7145 to eaa54f7 Compare November 9, 2025 16:43

S1ro1 and others added 5 commits November 10, 2025 19:05

Fix: good error for profile via api

e83b0f4

Fix: remove nvidia-smi from workflow

716aca9

Fix: polling time to 15s

cb880a7

limit profiling report length

2621ca1

limit number of kernels to be profiled

af80b61

ngc92 force-pushed the ncu-profile branch from 3adb49b to af80b61 Compare November 10, 2025 18:26

ngc92 added 3 commits November 10, 2025 19:30

stricter matching for kernel name lines

2931fd4

add an additional safety limit to ncu reports

110386e

fix

8a4c6b2

S1ro1 marked this pull request as ready for review November 10, 2025 19:22

Copilot AI review requested due to automatic review settings November 10, 2025 19:22

Fix: style

c9786fb

S1ro1 merged commit c1973b6 into main Nov 10, 2025
5 of 7 checks passed

Copilot AI reviewed Nov 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Ncu profile#368

[WIP] Ncu profile#368
S1ro1 merged 35 commits into
mainfrom
ncu-profile

ngc92 commented Nov 1, 2025

Uh oh!

github-actions Bot commented Nov 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if prof_run.profile.trace is not None:
	if prof_run.profile is not None and prof_run.profile.trace is not None:

	under NCU, and this function will rurnthe kernel excactly
	under NCU, and this function will run the kernel excactly

	f"Profiling {prof_run.run.result.get('benchmark.0.spec')}",
	f"Profiling {prof_run.run.result.get('benchmark.0.spec', 'unknown benchmark')}",

	submission_output = custom_kernel(cloned)
	custom_kernel(cloned)

Conversation

ngc92 commented Nov 1, 2025

Uh oh!

github-actions Bot commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Nov 1, 2025 •

edited

Loading