test(degeneration): re-enable GreedyDeterminism + audit remaining open bugs#173
Merged
Conversation
…CONFIG setenv Pre-existing DISABLED_GreedyDeterminism was a known gotcha not an imp bug: cuBLAS on Blackwell sm_120 picks different GEMM algorithms across calls within the same process unless CUBLAS_WORKSPACE_CONFIG=:4096:8 is set BEFORE the cuBLAS handle is created. Greedy decode (temp=0) then diverges due to accumulated FP16 rounding from different algorithms. Fix: set the env var in SetUpTestSuite() before the test fixture creates the engine. Renamed DISABLED_GreedyDeterminism → GreedyDeterminism. Validates that imp itself IS deterministic when given a deterministic GEMM dispatch. ## Secondary DISABLED_DispatchManual (test_attention_fmha_sm120.cu): investigated but could not root-cause in a single session. The kernel produces NaN when called via a direct manual setup but works fine via run_test() with identical Tensor shapes + data patterns. Likely a CUDA-stream / initial- state interaction specific to top-level TEST_F invocation. Kept DISABLED with expanded comment documenting the observed behavior + repro recipe (gtest_also_run_disabled_tests + Q nonzero / O NaN debug prints) for the next debug session. ## Validation - imp-tests --gtest_filter='*GreedyDeterminism*' → PASSES - verify-fast green (decode -2.00%, prefill -0.80%, graphs 1.39×) - All other DegenerationTest cases still pass ## Bug-audit summary (no fix actionable today) - Qwen3.5-27B MXFP4 IMA: model not local, can't reproduce - Mistral-3.2-NVFP4 long-context: model not local, can't reproduce - Gemma-4 Q4_K_M degeneration: deep Q4_K precision issue, Q8_0 workaround documented - DISABLED_BasicHD256 (MXFP4 FMHA): architectural smem limit, kernel optimization required - DISABLED_DispatchManual: investigated, deferred (see comment above) - NVFP4 dequant Stage-2 cuBLAS replacement: multi-day work - Spec-decode self-spec on stock models: conceptual issue, MTP scaffold shipped (PR #172)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-enables DISABLED_GreedyDeterminism. Pre-existing gotcha (cuBLAS algo non-determinism on Blackwell sm_120 without CUBLAS_WORKSPACE_CONFIG) — fix is setenv in SetUpTestSuite. Validates imp is deterministic given a deterministic GEMM. Also documents DISABLED_DispatchManual investigation (NaN output when invoked manually vs run_test() helper — root cause not isolated, kept DISABLED with debug recipe). Open-bug audit summary in commit message.