Spec correction: compression-communication channel definition (Section 1.1) + rewrite of probes 4.22–4.25 + axis-coverage reporting#18
Draft
FluffyAIcode wants to merge 2 commits intomainfrom
Conversation
- scheme_b_v344.py: v3.42 clone + [J-1] AMS_TRAINED_WEIGHTS env hook - train_v344.py: CPU training driver (60 steps, 398.5s) - ckpt/train_log.jsonl + train_stdout.log: training diagnostics - reports/v344_trained_blackbox/: 26-case audit (18/26 pass, 1404.3s) - audit_feedback.md: Section 7 compliant analysis Delta vs v3.42 (untrained 17/26): FAIL -> PASS: 4.12 prefix_stepwise_drift_trajectory, 4.21 decode_repetition_feedback_probe PASS -> FAIL: 4.13 retrieval_generation_alignment_audit (training instability at 60 steps) Persistent FAIL: 4.7, 4.10, 4.15, 4.17, 4.23, 4.24, 4.25 First 26-case run to exceed the 17+/-1 eval-time plateau. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…1.1); rewrite probes 4.22-4.25 that measured unreachable artifacts; add axis-coverage reporting (7.7, 7.8); retract pre-v3.45 single-probe channel-existence claims Summary of corrections: - NEW Section 1.1: precise four-axis definition (A compression / B cost / C fidelity / D stability). Replaces ambiguous 'cipher system' label used since v3.37. - 1.1.2 explicitly legitimises prefix attention + content_bias + suppression_bias + FS + mean-centered residual as channel mechanisms (not cheats). - 1.1.3 narrows 'banned' to the actually-banned list (prompt-keyed routing, mocks, corpus-memorised templates, per-probe code paths, stub backbones). - 4.22 anti-cheating: remove exclusion of hard-masking. Axis = C. - 4.21 rationale: reframe as D-axis operating-point metric, not 'anti-collapse'. - 4.23 acceptance: replace unreachable top-3 with mean-centered top-20 + median rank <= 100. Structurally achievable. - 4.24 acceptance: replace JL-noise-bound cosine-gap with LOO NN accuracy >= 0.75. Statistically powered at N=8. - 4.25 acceptance: replace saturation-bound top-12 count with continuous starter-mass ratio > 1.10. Unbounded above, monotone in capacity. - 4-meta rewritten: A/B/C/D axes replace seven-point P0..P3 attribute scheme. Gating downgrades for 4.23/4.24/4.25 until corrected metrics land. - 4-meta.1 NEW: axis-coverage table required in every v3.45+ report. - 7.7 NEW: channel-axis framing rules; ban value-judgment use of 'cipher works' language. - 7.8 NEW: retract pre-v3.45 single-probe-implies-channel-existence statements. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
This was referenced Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scope
This PR corrects
V331_BLACKBOX_TEST_SPEC.md, not any SUT code. It supersedes an ambiguous reading of密语系统 / cipher systemused in the v3.37 cipher-probe introduction and carried through v3.38-v3.44-Trained.What was wrong in the pre-v3.45 spec
密语系统was introduced without a precise definition and was interpreted by subsequent audit feedback as "prefix attention alone must carry semantics; any use of content_bias or hard-mask counts as bypassing the cipher." This is not a standard interpretation and it conflated two targets that should have been separate.top-3 of (wte @ slot_1) ∩ rare_keywords >= 1. Qwen 2.5 token ids 0/1/2 (!,",#) sit near the WTE global mean; any top-K on an unnormalized cosine query is dominated by them regardless of slot content. Metric was measuring WTE geometry, not channel quality. 0/26 pass across v3.38–v3.44-Trained.intra_domain_cos_mean - inter_domain_cos_mean >= 0.15at N=3 memories per domain. JL projection variance intod_ctx=128is O(1/√N) ≈ 0.58, exceeding the threshold. 0/26 pass across v3.38–v3.44-Trained.content_starters_top12_B >= content_starters_top12_A + 1. Saturates at 12/12 in any configuration with a functioning channel; monotone growth impossible by construction. 0/26 pass across v3.38–v3.44-Trained.ContentTokenClassifier.pure_function_maskis a legitimate channel mechanism under the corrected definition.声量 / 词汇表 / 抗塌缩 / 调用精细度 / 密语信道容量 / 密语表达形式 / 消歧). This was design-stage motivation, not test semantics.What this PR adds
median rank_of_best_rare <= 100out of vocab 151936. Passable by an actually-functioning tail subchannel.mass_B / mass_A > 1.10. Unbounded above, monotone in capacity, not saturation-bound.What this PR does NOT do
v331_blackbox_eval.pyrunner code. The runner will be updated in a separate PR once the spec has landed.scheme_b_vXXX.py).Runner update requirements (for a follow-up PR, not this one)
To implement the corrected 4.23/4.24/4.25 metrics, the runner must:
wte_meanonce at runner startup and pass it to the 4.23 probesklearn-style LOO NN implementation (or a 20-line numpy equivalent) to the 4.24 probereport.mdUntil that follow-up lands, the v3.45+ audit runs with the old runner will still fail 4.23/4.24/4.25 on the pre-correction metrics. The spec correction is valid independently; it defines the target that the runner update must meet.