test(degeneration): re-enable GreedyDeterminism + audit remaining open bugs by kekzl · Pull Request #173 · kekzl/imp

kekzl · 2026-05-14T12:41:27Z

Re-enables DISABLED_GreedyDeterminism. Pre-existing gotcha (cuBLAS algo non-determinism on Blackwell sm_120 without CUBLAS_WORKSPACE_CONFIG) — fix is setenv in SetUpTestSuite. Validates imp is deterministic given a deterministic GEMM. Also documents DISABLED_DispatchManual investigation (NaN output when invoked manually vs run_test() helper — root cause not isolated, kept DISABLED with debug recipe). Open-bug audit summary in commit message.

…CONFIG setenv Pre-existing DISABLED_GreedyDeterminism was a known gotcha not an imp bug: cuBLAS on Blackwell sm_120 picks different GEMM algorithms across calls within the same process unless CUBLAS_WORKSPACE_CONFIG=:4096:8 is set BEFORE the cuBLAS handle is created. Greedy decode (temp=0) then diverges due to accumulated FP16 rounding from different algorithms. Fix: set the env var in SetUpTestSuite() before the test fixture creates the engine. Renamed DISABLED_GreedyDeterminism → GreedyDeterminism. Validates that imp itself IS deterministic when given a deterministic GEMM dispatch. ## Secondary DISABLED_DispatchManual (test_attention_fmha_sm120.cu): investigated but could not root-cause in a single session. The kernel produces NaN when called via a direct manual setup but works fine via run_test() with identical Tensor shapes + data patterns. Likely a CUDA-stream / initial- state interaction specific to top-level TEST_F invocation. Kept DISABLED with expanded comment documenting the observed behavior + repro recipe (gtest_also_run_disabled_tests + Q nonzero / O NaN debug prints) for the next debug session. ## Validation - imp-tests --gtest_filter='*GreedyDeterminism*' → PASSES - verify-fast green (decode -2.00%, prefill -0.80%, graphs 1.39×) - All other DegenerationTest cases still pass ## Bug-audit summary (no fix actionable today) - Qwen3.5-27B MXFP4 IMA: model not local, can't reproduce - Mistral-3.2-NVFP4 long-context: model not local, can't reproduce - Gemma-4 Q4_K_M degeneration: deep Q4_K precision issue, Q8_0 workaround documented - DISABLED_BasicHD256 (MXFP4 FMHA): architectural smem limit, kernel optimization required - DISABLED_DispatchManual: investigated, deferred (see comment above) - NVFP4 dequant Stage-2 cuBLAS replacement: multi-day work - Spec-decode self-spec on stock models: conceptual issue, MTP scaffold shipped (PR #172)

github-actions Bot enabled auto-merge (squash) May 14, 2026 12:41

github-actions Bot merged commit 95d0487 into main May 14, 2026
3 checks passed

kekzl deleted the fix/open-bugs branch May 14, 2026 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(degeneration): re-enable GreedyDeterminism + audit remaining open bugs#173

test(degeneration): re-enable GreedyDeterminism + audit remaining open bugs#173
github-actions[bot] merged 1 commit into
mainfrom
fix/open-bugs

kekzl commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kekzl commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant