Skip to content

chore(build): clean all compile warnings — 156 → 0#176

Merged
kekzl merged 1 commit into
mainfrom
chore/clean-build-warnings
May 14, 2026
Merged

chore(build): clean all compile warnings — 156 → 0#176
kekzl merged 1 commit into
mainfrom
chore/clean-build-warnings

Conversation

@kekzl
Copy link
Copy Markdown
Owner

@kekzl kekzl commented May 14, 2026

Summary

Fix every warning emitted by make build (test build included). 156 → 0.

Warning classes addressed

# Source Diagnostic Fix
1 src/compute/*.cu (3 files) cudaGetDriverEntryPoint deprecated Migrate to cudaGetDriverEntryPointByVersion with CUDA_VERSION
2 src/model/tensor_kind_name.cpp -Wswitch: _COUNT not handled Add _COUNT case → falls through to UNKNOWN return
3 src/quant/{nvfp4,mxfp4}_gemm.cu (12 kernels) ptxas: minnctapersm out of range Drop ignored , 8 hint from __launch_bounds__ — SM120 caps at 1536 threads/SM, 8×256=2048 exceeded it and ptxas was silently dropping the hint anyway. SASS unchanged.
4 tests/test_llm_compressor_loader.cpp -Wunused-result (~11×) + -Wpedantic (3×) Replace system("mkdir -p")/system("rm -rf") with std::filesystem::create_directories/remove_all; replace GNU ?: shortcut with tmpdir() helper
5 src/compute/mxf4nvf4_mma_variants_bench.cu BENCH_PREAMBLE NVCC #550-D: set but never used (~140×) (void)var; doesn't silence NVCC #550-D — replace with C++17 [[maybe_unused]] attribute on declarations

Baseline refresh

tests/perf_baseline.json was last set during the MTP work earlier today; container/cuBLAS state has since shifted (multiple Docker rebuilds in this session). Refresh against the current container raises the gate slightly: tg128 148.15 → 152.01 tok/s, pp512 14260.84 → 14419.79 tok/s. Both metrics improved — refresh tightens the regression threshold rather than loosening it.

Validation

  • make build → 0 warnings, 0 errors
  • make verify-fast → all gates green (decode +0%, prefill -2.68%, graphs-ON 1.91× decode speedup, smoke prompts coherent)

🤖 Generated with Claude Code

Five distinct warning classes addressed:

1. cudaGetDriverEntryPoint → cudaGetDriverEntryPointByVersion
   Deprecated since CUDA 12.5. Three call sites in src/compute/.
   Pass CUDA_VERSION macro so the lookup matches the toolkit version.

2. TensorKind::_COUNT case added to tensor_kind_name switch
   Silences -Wswitch; falls through to existing UNKNOWN return.

3. __launch_bounds__(kMRThreads, 8) → __launch_bounds__(kMRThreads)
   on 12 GEMV kernels in nvfp4_gemm.cu / mxfp4_gemm.cu. kMRThreads=256,
   sm_120 caps at 1536 threads/SM, so 8×256=2048 exceeded the limit and
   ptxas silently dropped the .minnctapersm hint anyway. Dropping the
   2nd arg leaves the hard threads-per-block bound and removes the
   already-ignored hint. SASS unchanged.

4. test_llm_compressor_loader.cpp:
   - system("mkdir -p")/system("rm -rf") → std::filesystem::create_directories
     / remove_all (-Wunused-result, ~11 sites). Avoids shelling out.
   - getenv("TMPDIR") ?: "/tmp" GNU extension → small tmpdir() helper
     (-Wpedantic, 3 sites).

5. mxf4nvf4_mma_variants_bench.cu BENCH_PREAMBLE macro:
   - (void)var; doesn't suppress NVCC #550-D "set but not used"
     (~140 instances across 10 kernels). Replaced with [[maybe_unused]]
     C++17 attribute on the declarations.

Also refreshes tests/perf_baseline.json (post-rebuild container state).
Decode tg128 148.15 → 152.01, prefill pp512 14260.84 → 14419.79. Both
metrics improved; the refresh raises the bar for future regressions.

Verified: make build → 0 warnings, 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kekzl kekzl enabled auto-merge (squash) May 14, 2026 22:01
@kekzl kekzl merged commit b09d581 into main May 14, 2026
3 checks passed
@kekzl kekzl deleted the chore/clean-build-warnings branch May 14, 2026 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant