Skip to content

runner: add vector challenge cleanup#5

Open
chokevin wants to merge 2 commits into
mainfrom
kernel-challenge-cleanup-20260505
Open

runner: add vector challenge cleanup#5
chokevin wants to merge 2 commits into
mainfrom
kernel-challenge-cleanup-20260505

Conversation

@chokevin
Copy link
Copy Markdown
Owner

@chokevin chokevin commented May 5, 2026

What

Adds the vectorsum_v2 kernel-challenge track: a standalone submission.py, reusable vector-sum kernels, benchmark runner, CLI command, and Rune experiment/dispatch wiring. It also carries the FSDP sweep knobs used in the optimization work and makes the NCU parser tests self-contained with synthetic fixtures instead of ignored runs/ artifacts.

Why

The kernel-challenge worktree had useful optimization code mixed with local scratch artifacts and environment assumptions. This cleans it onto a feature branch, restores generated profile consistency, ignores .tmp/, and fixes Python test setup so a fresh checkout can run the gates after uv sync --extra dev.

Non-goals

  • Does not submit new cluster vector-sum sweeps.
  • Does not claim a new leaderboard result.
  • Does not change the generated Rune profile pack contents beyond keeping it in sync.

Testing

  • uv sync --extra dev
  • uv run ruff format swordfish tests submission.py
  • uv run ruff check swordfish tests submission.py
  • make test
  • uv run python -m swordfish.runner bench-vectorsum --backend torch --size 64 --dtype fp32 --repeats 1 --warmup 0 --iters 1 --device cpu --allow-cpu --arch-label a100 --out /tmp/sf-vectorsum-smoke.json
  • uv run python -m swordfish.runner generate-rune-profiles --check --out infra/rune/profiles/swordfish-pack.yaml

Risk

Low for local users: new vector-sum paths are additive and covered by CPU smoke tests. Cluster risk is limited to dispatch rendering for the new experiment and FSDP knobs; rollback is reverting this commit or using existing gemm/liger-* experiment names.

Ubuntu and others added 2 commits May 5, 2026 06:31
Add the vectorsum_v2 benchmark path, standalone submission entrypoint, Rune dispatch wiring, and FSDP sweep knobs used by the kernel-challenge work.

Make local Python tests self-contained by generating synthetic NCU CSV fixtures instead of depending on ignored runs/ artifacts, and ignore local .tmp scratch output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make the test target request the uv dev extra explicitly so ruff and pytest are stable after ordinary uv run commands.

Use the public NVCR PyTorch base for GitHub Actions image builds while keeping the Dockerfile default private ACR base for the canonical Azure ACR build path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant