Skip to content

[WIP] Ncu profile#368

Merged
S1ro1 merged 35 commits into
mainfrom
ncu-profile
Nov 10, 2025
Merged

[WIP] Ncu profile#368
S1ro1 merged 35 commits into
mainfrom
ncu-profile

Conversation

@ngc92

@ngc92 ngc92 commented Nov 1, 2025

Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions

github-actions Bot commented Nov 1, 2025

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/libkernelbot
  report.py 73, 350
Project Total  

This report was generated by python-coverage-comment-action

@ngc92 ngc92 force-pushed the ncu-profile branch 3 times, most recently from 248c411 to ce0c6ec Compare November 1, 2025 19:34
small code improvements
@ngc92 ngc92 force-pushed the ncu-profile branch 2 times, most recently from ae72216 to 4928fc2 Compare November 9, 2025 13:39
@ngc92 ngc92 force-pushed the ncu-profile branch 3 times, most recently from de51bbf to 779ee7f Compare November 9, 2025 14:22
@ngc92 ngc92 force-pushed the ncu-profile branch 4 times, most recently from 16b4d0c to d8d7145 Compare November 9, 2025 16:22
@S1ro1 S1ro1 marked this pull request as ready for review November 10, 2025 19:22
Copilot AI review requested due to automatic review settings November 10, 2025 19:22
@S1ro1 S1ro1 merged commit c1973b6 into main Nov 10, 2025
5 of 7 checks passed

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This is a work-in-progress PR that adds NVIDIA Nsight Compute (NCU) profiling support to complement the existing ROCm profiling capabilities. The changes refactor the profiling infrastructure to support multiple profilers and enable per-benchmark profiling runs.

  • Adds NCU profiling implementation with filtered report generation
  • Refactors profiling to run each benchmark separately and embed trace data in results
  • Updates test infrastructure and workflows for new GPU runners

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
src/libkernelbot/run_eval.py Adds NCU profiling function, refactors profile_program to use temporary directories, changes parameter order to use keyword-only arguments, and modifies run_evaluation to handle per-benchmark profiling
src/libkernelbot/report.py Adds File dataclass for attaching profile files to reports, updates report generation to handle multiple profile runs, and adds _shortname helper for filename generation
tests/test_report.py Updates tests to reflect new profile reporting structure with trace field and File attachments
examples/eval.py Adds _run_single_profile_ncu function for NCU-specific profiling and updates profile selection logic based on POPCORN_NCU environment variable
src/kernelbot/discord_utils.py Adds _send_file helper function and updates _send_split_log to add newlines properly and send messages silently
src/kernelbot/discord_reporter.py Adds File handling to display_report for uploading profile attachments
src/kernelbot/api/api_utils.py Adds check to block profile submissions via API and reformats code
src/runners/modal_runner.py Updates Python version from 3.12 to 3.13 in CUDA image
.github/workflows/nvidia_workflow.yml Updates runner to nvidia-docker-b200-8-x86-64, removes container configuration, and simplifies setup steps
.github/workflows/nvidia-arc-health.yml Updates runner and removes Python/PyTorch installation steps
src/libkernelbot/launchers/github.py Adds conditional check for profile-data artifact before setting download_url and reduces polling interval
scripts/ci_test_python.py Updates test to use python3 command and keyword argument for system parameter
scripts/ci_test_cuda.py Updates tests to use keyword argument for system parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if _handle_crash_report(report, prof_run):
return report

if prof_run.profile.trace is not None:

Copilot AI Nov 10, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential AttributeError if prof_run.profile is None. The code checks prof_run.profile.trace is not None without first verifying that prof_run.profile itself is not None. Consider adding a null check:

if prof_run.profile is not None and prof_run.profile.trace is not None:
Suggested change
if prof_run.profile.trace is not None:
if prof_run.profile is not None and prof_run.profile.trace is not None:

Copilot uses AI. Check for mistakes.
Comment on lines +225 to +229
if submission_mode_enum == SubmissionMode.PROFILE:
raise HTTPException(
status_code=400,
detail="Profile submissions are not currently supported via API, use Discord instead.",
)

Copilot AI Nov 10, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate check for SubmissionMode.PROFILE on lines 214 and 225. The first check already raises an exception if the mode is PROFILE, so the second check on line 225 is unreachable. Consider removing the duplicate check on lines 225-229.

Suggested change
if submission_mode_enum == SubmissionMode.PROFILE:
raise HTTPException(
status_code=400,
detail="Profile submissions are not currently supported via API, use Discord instead.",
)

Copilot uses AI. Check for mistakes.
Comment on lines 27 to 28
- name: Create input files
shell: bash

Copilot AI Nov 10, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Create input files" step runs apt-get commands without sudo and without a container context. The previous version used a container (image: nvidia/cuda:12.4.0-devel-ubuntu22.04), which provided a root environment. Without a container, these commands will likely fail on the self-hosted runner unless the runner is configured to run as root (which is a security risk). Consider either restoring the container setup or using sudo for the apt-get commands, or ensure jq is pre-installed on the runner.

Copilot uses AI. Check for mistakes.
def log_one(base_name):
spec = run.result.get(f"{base_name}.spec")

report: str = run.result.get(f"{base_name}.report")

Copilot AI Nov 10, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential error if report is None. The code calls .encode() on report without checking if it exists first. If benchmark.{i}.report is not in the result dictionary, this will raise an AttributeError. Consider adding a check:

report: str = run.result.get(f"{base_name}.report")
if report is None:
    return
report = base64.b64decode(report.encode("utf-8"), b"+*").decode("utf-8")
Suggested change
report: str = run.result.get(f"{base_name}.report")
report: str = run.result.get(f"{base_name}.report")
if report is None:
return

Copilot uses AI. Check for mistakes.
Comment thread examples/eval.py
"""
Profiles a single benchmark using ncu. Note: this does not
invoke NCU; instead, it is expected that eval is launched
under NCU, and this function will rurnthe kernel excactly

Copilot AI Nov 10, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the docstring: "rurnthe" should be "run the".

Suggested change
under NCU, and this function will rurnthe kernel excactly
under NCU, and this function will run the kernel excactly

Copilot uses AI. Check for mistakes.
Comment on lines +368 to +370
report.add_file(
f"profile-{_shortname(prof_run.run.result.get('benchmark.0.spec'))}.zip",
f"{prof_run.profile.profiler} report - " + prof_run.run.result.get("benchmark.0.spec"),

Copilot AI Nov 10, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential error if benchmark.0.spec is not in the result dictionary. The get() method will return None if the key doesn't exist, which will cause _shortname() to fail with an AttributeError when trying to call .replace() on None. Consider adding a default value or null check:

spec = prof_run.run.result.get('benchmark.0.spec', 'unknown')
report.add_file(
    f"profile-{_shortname(spec)}.zip",
    f"{prof_run.profile.profiler} report - {spec}",
    base64.b64decode(prof_run.profile.trace),
)
Suggested change
report.add_file(
f"profile-{_shortname(prof_run.run.result.get('benchmark.0.spec'))}.zip",
f"{prof_run.profile.profiler} report - " + prof_run.run.result.get("benchmark.0.spec"),
spec = prof_run.run.result.get('benchmark.0.spec', 'unknown')
report.add_file(
f"profile-{_shortname(spec)}.zip",
f"{prof_run.profile.profiler} report - {spec}",

Copilot uses AI. Check for mistakes.

if prof_run.profile.trace is not None:
report.add_log(
f"Profiling {prof_run.run.result.get('benchmark.0.spec')}",

Copilot AI Nov 10, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential error if benchmark.0.spec is not in the result dictionary. The get() method will return None if the key doesn't exist, which will be interpolated as the string "None" in the log header. Consider adding a default value:

f"Profiling {prof_run.run.result.get('benchmark.0.spec', 'unknown benchmark')}"
Suggested change
f"Profiling {prof_run.run.result.get('benchmark.0.spec')}",
f"Profiling {prof_run.run.result.get('benchmark.0.spec', 'unknown benchmark')}",

Copilot uses AI. Check for mistakes.
Comment thread examples/eval.py
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
with nvtx_range("custom_kernel"):
submission_output = custom_kernel(_clone_data(data, 0))
submission_output = custom_kernel(cloned)

Copilot AI Nov 10, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable submission_output is not used.

Copilot uses AI. Check for mistakes.
Comment thread examples/eval.py

cloned = _clone_data(data, 0)
with nvtx_range("custom_kernel"):
submission_output = custom_kernel(cloned)

Copilot AI Nov 10, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable submission_output is not used.

Suggested change
submission_output = custom_kernel(cloned)
custom_kernel(cloned)

Copilot uses AI. Check for mistakes.
] + call

run_result = run_program(
call, seed=seed, timeout=timeout, multi_gpu=multi_gpu, extra_env={"POPCORN_NCU": "1"}

Copilot AI Nov 10, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
SinatrasC pushed a commit to SinatrasC/kernelbot that referenced this pull request Jun 17, 2026
* Change runner from gpumode-nvidia-arc to Nvidia-A100

* Update nvidia-arc-health.yml

* Update nvidia-arc-health.yml

* Feat: run health on b200

* tmp

* tmp

* tmp

* feat

* feat

* feat

* replace nvidia workflow to point to our b200 cluster

* Fix: container

* Fix: python->python3

* Fix: add back deps

* Fix: python->python3

* Fix: python->python3

* Add nvidia-smi

* split profiling into rocm/ncu;
small code improvements

* profile each benchmark individually for cleaner traces

* profile in tempdir

* send profile results as attached files

* don't spam alerts

* include default ncu report

* attempt at filtered ncu

* formatting fix

* fix tests

* Fix: good error for profile via api

* Fix: remove nvidia-smi from workflow

* Fix: polling time to 15s

* limit profiling report length

* limit number of kernels to be profiled

* stricter matching for kernel name lines

* add an additional safety limit to ncu reports

* fix

* Fix: style

---------

Co-authored-by: Mark Saroufim <marksaroufim@meta.com>
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
Co-authored-by: Alex Zhang <alex.lx.zhang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants