Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/profiling/INDEX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Autoresearch run index

One row per profiling run produced by the swordfish-autoresearch chart.
Newest first. PR column links to the draft PR carrying the artifacts.

| timestamp (UTC) | source SHA | shapes | impls | GPU | 8b-b1 marlin TFLOPS | run dir | PR |
|---|---|---|---|---|---|---|---|
| 20260420T020830Z | `eb0f6e3` | voice | fp16,marlin | NVIDIA A100-SXM4-80GB | 0.7 | [`20260420T020830Z/`](./marlin/20260420T020830Z/) | [link](https://github.com/chokevin/swordfish/pull/4) |
Empty file.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/70b-tp2-b1.ncu.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 1198 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 1198
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/70b-tp2-b1.ncu.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 1117 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 1117
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
Empty file.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/70b-tp2-b4.ncu.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 1360 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 1360
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/70b-tp2-b4.ncu.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 1279 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 1279
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
Empty file.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/70b-tp2-b8.ncu.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 1522 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 1522
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/70b-tp2-b8.ncu.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 1441 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 1441
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
Empty file.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/8b-b1.ncu.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 712 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 712
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/8b-b1.ncu.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 631 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 631
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
Empty file.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/8b-b4.ncu.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 874 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 874
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/8b-b4.ncu.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 793 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 793
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
Empty file.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/8b-b8.ncu.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 1036 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 1036
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
10 changes: 10 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/8b-b8.ncu.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
==PROF== Connected to process 955 (/usr/bin/python3.10)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
==ERROR== Failed to profile "distribution_elementwise_grid..." in process 955
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
26 changes: 26 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Autoresearch run `20260420T020830Z`

- **source SHA:** `eb0f6e3`
- **GPU:** NVIDIA A100-SXM4-80GB (cc 8.0, 79.3 GB)
- **CUDA / torch / triton:** 12.4 / 2.4.0a0+07cecf4168.nv24.05 / 3.0.0
- **shapes:** `voice` **impls:** `fp16,marlin` **repeats:** 5
- **marlin SHA:** `1f25790bdd49fba53106164a24666dade68d7c90`

## Results

| shape | impl | ms_mean | ms_p95 | TFLOPS | speedup vs fp16 | error |
|---|---|---|---|---|---|---|
| 8b-b1 | fp16 | 0.031 | 0.032 | 1.1 | x1.00 | |
| 8b-b1 | marlin | 0.050 | 0.055 | 0.7 | x0.62 | |
| 8b-b4 | fp16 | 0.031 | 0.031 | 4.3 | x1.00 | |
| 8b-b4 | marlin | 0.049 | 0.049 | 2.7 | x0.63 | |
| 8b-b8 | fp16 | 0.031 | 0.032 | 8.6 | x1.00 | |
| 8b-b8 | marlin | 0.049 | 0.050 | 5.5 | x0.63 | |
| 70b-tp2-b1 | fp16 | 0.050 | 0.055 | 1.3 | x1.00 | |
| 70b-tp2-b1 | marlin | 0.051 | 0.052 | 1.3 | x0.99 | |
| 70b-tp2-b4 | fp16 | 0.049 | 0.049 | 5.5 | x1.00 | |
| 70b-tp2-b4 | marlin | 0.049 | 0.050 | 5.4 | x1.00 | |
| 70b-tp2-b8 | fp16 | 0.049 | 0.050 | 10.9 | x1.00 | |
| 70b-tp2-b8 | marlin | 0.049 | 0.050 | 10.9 | x1.00 | |

![roofline](./roofline.png)
43 changes: 43 additions & 0 deletions docs/profiling/marlin/20260420T020830Z/env.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
=== profile_marlin.sh @ 20260420T020830Z ===
--- host ---
Linux swordfish-profile-sf-prof-260420-020113-847z9 6.6.126.1-1.azl3 #1 SMP PREEMPT_DYNAMIC Wed Mar 4 05:04:40 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
--- nvidia-smi ---
Mon Apr 20 02:08:30 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 0000000B:00:00.0 Off | 0 |
| N/A 35C P0 69W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
--- nvcc ---
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
--- nsys ---
NVIDIA Nsight Systems version 2024.2.1.106-242134037904v0
--- ncu ---
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2024 NVIDIA Corporation
Version 2024.1.1.0 (build 33998838) (public-release)
--- python / torch / triton / marlin ---
python 3.10.12
torch 2.4.0a0+07cecf4168.nv24.05 cuda 12.4
triton 3.0.0
marlin unknown
--- repo SHA ---
eb0f6e31ca0e2e4d75507f9e77bfba6738c3a22a
Loading
Loading