chore(build): clean all compile warnings — 156 → 0#176
Merged
Conversation
Five distinct warning classes addressed:
1. cudaGetDriverEntryPoint → cudaGetDriverEntryPointByVersion
Deprecated since CUDA 12.5. Three call sites in src/compute/.
Pass CUDA_VERSION macro so the lookup matches the toolkit version.
2. TensorKind::_COUNT case added to tensor_kind_name switch
Silences -Wswitch; falls through to existing UNKNOWN return.
3. __launch_bounds__(kMRThreads, 8) → __launch_bounds__(kMRThreads)
on 12 GEMV kernels in nvfp4_gemm.cu / mxfp4_gemm.cu. kMRThreads=256,
sm_120 caps at 1536 threads/SM, so 8×256=2048 exceeded the limit and
ptxas silently dropped the .minnctapersm hint anyway. Dropping the
2nd arg leaves the hard threads-per-block bound and removes the
already-ignored hint. SASS unchanged.
4. test_llm_compressor_loader.cpp:
- system("mkdir -p")/system("rm -rf") → std::filesystem::create_directories
/ remove_all (-Wunused-result, ~11 sites). Avoids shelling out.
- getenv("TMPDIR") ?: "/tmp" GNU extension → small tmpdir() helper
(-Wpedantic, 3 sites).
5. mxf4nvf4_mma_variants_bench.cu BENCH_PREAMBLE macro:
- (void)var; doesn't suppress NVCC #550-D "set but not used"
(~140 instances across 10 kernels). Replaced with [[maybe_unused]]
C++17 attribute on the declarations.
Also refreshes tests/perf_baseline.json (post-rebuild container state).
Decode tg128 148.15 → 152.01, prefill pp512 14260.84 → 14419.79. Both
metrics improved; the refresh raises the bar for future regressions.
Verified: make build → 0 warnings, 0 errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix every warning emitted by
make build(test build included). 156 → 0.Warning classes addressed
src/compute/*.cu(3 files)cudaGetDriverEntryPoint deprecatedcudaGetDriverEntryPointByVersionwithCUDA_VERSIONsrc/model/tensor_kind_name.cpp-Wswitch: _COUNT not handled_COUNTcase → falls through to UNKNOWN returnsrc/quant/{nvfp4,mxfp4}_gemm.cu(12 kernels)ptxas: minnctapersm out of range, 8hint from__launch_bounds__— SM120 caps at 1536 threads/SM, 8×256=2048 exceeded it and ptxas was silently dropping the hint anyway. SASS unchanged.tests/test_llm_compressor_loader.cpp-Wunused-result(~11×) +-Wpedantic(3×)system("mkdir -p")/system("rm -rf")withstd::filesystem::create_directories/remove_all; replace GNU?:shortcut withtmpdir()helpersrc/compute/mxf4nvf4_mma_variants_bench.cuBENCH_PREAMBLE#550-D: set but never used(~140×)(void)var;doesn't silence NVCC #550-D — replace with C++17[[maybe_unused]]attribute on declarationsBaseline refresh
tests/perf_baseline.jsonwas last set during the MTP work earlier today; container/cuBLAS state has since shifted (multiple Docker rebuilds in this session). Refresh against the current container raises the gate slightly: tg128 148.15 → 152.01 tok/s, pp512 14260.84 → 14419.79 tok/s. Both metrics improved — refresh tightens the regression threshold rather than loosening it.Validation
make build→ 0 warnings, 0 errorsmake verify-fast→ all gates green (decode +0%, prefill -2.68%, graphs-ON 1.91× decode speedup, smoke prompts coherent)🤖 Generated with Claude Code