Skip to content

test(degeneration): re-enable GreedyDeterminism + audit remaining open bugs#173

Merged
github-actions[bot] merged 1 commit into
mainfrom
fix/open-bugs
May 14, 2026
Merged

test(degeneration): re-enable GreedyDeterminism + audit remaining open bugs#173
github-actions[bot] merged 1 commit into
mainfrom
fix/open-bugs

Conversation

@kekzl
Copy link
Copy Markdown
Owner

@kekzl kekzl commented May 14, 2026

Re-enables DISABLED_GreedyDeterminism. Pre-existing gotcha (cuBLAS algo non-determinism on Blackwell sm_120 without CUBLAS_WORKSPACE_CONFIG) — fix is setenv in SetUpTestSuite. Validates imp is deterministic given a deterministic GEMM. Also documents DISABLED_DispatchManual investigation (NaN output when invoked manually vs run_test() helper — root cause not isolated, kept DISABLED with debug recipe). Open-bug audit summary in commit message.

…CONFIG setenv

Pre-existing DISABLED_GreedyDeterminism was a known gotcha not an imp bug:
cuBLAS on Blackwell sm_120 picks different GEMM algorithms across calls
within the same process unless CUBLAS_WORKSPACE_CONFIG=:4096:8 is set BEFORE
the cuBLAS handle is created. Greedy decode (temp=0) then diverges due to
accumulated FP16 rounding from different algorithms.

Fix: set the env var in SetUpTestSuite() before the test fixture creates the
engine. Renamed DISABLED_GreedyDeterminism → GreedyDeterminism. Validates
that imp itself IS deterministic when given a deterministic GEMM dispatch.

## Secondary

DISABLED_DispatchManual (test_attention_fmha_sm120.cu): investigated but
could not root-cause in a single session. The kernel produces NaN when
called via a direct manual setup but works fine via run_test() with
identical Tensor shapes + data patterns. Likely a CUDA-stream / initial-
state interaction specific to top-level TEST_F invocation. Kept DISABLED
with expanded comment documenting the observed behavior + repro recipe
(gtest_also_run_disabled_tests + Q nonzero / O NaN debug prints) for the
next debug session.

## Validation
- imp-tests --gtest_filter='*GreedyDeterminism*' → PASSES
- verify-fast green (decode -2.00%, prefill -0.80%, graphs 1.39×)
- All other DegenerationTest cases still pass

## Bug-audit summary (no fix actionable today)
- Qwen3.5-27B MXFP4 IMA: model not local, can't reproduce
- Mistral-3.2-NVFP4 long-context: model not local, can't reproduce
- Gemma-4 Q4_K_M degeneration: deep Q4_K precision issue, Q8_0 workaround documented
- DISABLED_BasicHD256 (MXFP4 FMHA): architectural smem limit, kernel optimization required
- DISABLED_DispatchManual: investigated, deferred (see comment above)
- NVFP4 dequant Stage-2 cuBLAS replacement: multi-day work
- Spec-decode self-spec on stock models: conceptual issue, MTP scaffold shipped (PR #172)
@github-actions github-actions Bot enabled auto-merge (squash) May 14, 2026 12:41
@github-actions github-actions Bot merged commit 95d0487 into main May 14, 2026
3 checks passed
@kekzl kekzl deleted the fix/open-bugs branch May 14, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant