Skip to content

Spec correction: compression-communication channel definition (Section 1.1) + rewrite of probes 4.22–4.25 + axis-coverage reporting#18

Draft
FluffyAIcode wants to merge 2 commits intomainfrom
AgentMemory/spec-v331-compression-channel-correction-7e97
Draft

Spec correction: compression-communication channel definition (Section 1.1) + rewrite of probes 4.22–4.25 + axis-coverage reporting#18
FluffyAIcode wants to merge 2 commits intomainfrom
AgentMemory/spec-v331-compression-channel-correction-7e97

Conversation

@FluffyAIcode
Copy link
Copy Markdown
Owner

Scope

This PR corrects V331_BLACKBOX_TEST_SPEC.md, not any SUT code. It supersedes an ambiguous reading of 密语系统 / cipher system used in the v3.37 cipher-probe introduction and carried through v3.38-v3.44-Trained.

What was wrong in the pre-v3.45 spec

  1. The term 密语系统 was introduced without a precise definition and was interpreted by subsequent audit feedback as "prefix attention alone must carry semantics; any use of content_bias or hard-mask counts as bypassing the cipher." This is not a standard interpretation and it conflated two targets that should have been separate.
  2. Three structural probes had unreachable or defective acceptance criteria:
    • 4.23: top-3 of (wte @ slot_1) ∩ rare_keywords >= 1. Qwen 2.5 token ids 0/1/2 (!, ", #) sit near the WTE global mean; any top-K on an unnormalized cosine query is dominated by them regardless of slot content. Metric was measuring WTE geometry, not channel quality. 0/26 pass across v3.38–v3.44-Trained.
    • 4.24: intra_domain_cos_mean - inter_domain_cos_mean >= 0.15 at N=3 memories per domain. JL projection variance into d_ctx=128 is O(1/√N) ≈ 0.58, exceeding the threshold. 0/26 pass across v3.38–v3.44-Trained.
    • 4.25: content_starters_top12_B >= content_starters_top12_A + 1. Saturates at 12/12 in any configuration with a functioning channel; monotone growth impossible by construction. 0/26 pass across v3.38–v3.44-Trained.
  3. Anti-cheating clause on 4.22 explicitly banned hard-masking as a solution path. Hard-masking derived from ContentTokenClassifier.pure_function_mask is a legitimate channel mechanism under the corrected definition.
  4. Section 4-meta mapped each probe to an attribute from a seven-point "P0..P3 proposals" scheme (声量 / 词汇表 / 抗塌缩 / 调用精细度 / 密语信道容量 / 密语表达形式 / 消歧). This was design-stage motivation, not test semantics.

What this PR adds

Addition Purpose
Section 1.1 Precise four-axis definition: A compression ratio, B O(1)-in-N injection cost, C semantic fidelity vs naive RAG, D channel stability
1.1.2 Explicit enumeration of legitimate channel mechanisms (prefix, content_bias, suppression_bias, fwd_function_suppression, mean-centered residual, retrieval ranking, context_descriptor)
1.1.3 Narrower ban list: prompt-keyed routing, mocks, memorised templates, audit-only code paths, stub backbones
1.1.4 Historical note on the v3.38-v3.44 probe misspecification
4.23 correction Mean-centered top-20 intersection + median rank_of_best_rare <= 100 out of vocab 151936. Passable by an actually-functioning tail subchannel.
4.24 correction LOO NN classification accuracy >= 0.75 at N=8. Clopper-Pearson CI is bounded; JL variance no longer dominates.
4.25 correction Starter-mass ratio mass_B / mass_A > 1.10. Unbounded above, monotone in capacity, not saturation-bound.
4.22 correction Remove hard-masking exclusion. Metric and threshold retained.
4.21, 4.20, 4.26 correction Axis re-mapping only, no metric changes
4-meta rewrite A/B/C/D axes replace seven-point attribute scheme. Gating table shows pre-v3.45 vs v3.45+ distinction.
4-meta.1 NEW — axis-coverage table required in every v3.45+ audit report
Section 7.7 Channel-axis framing rules for human-authored reports. Bans value-judgment use of "cipher works"; requires axis-specific numeric claims instead.
Section 7.8 Retraction notice: pre-v3.45 feedback documents that claimed a single probe's PASS or FAIL established or refuted channel existence are superseded.

What this PR does NOT do

  • Does not modify v331_blackbox_eval.py runner code. The runner will be updated in a separate PR once the spec has landed.
  • Does not rewrite pre-v3.45 audit feedback documents. Retroactive retraction is handled by Section 7.8.
  • Does not change any SUT (scheme_b_vXXX.py).
  • Does not change pass/fail gating of any Section 4.1–4.19 case.

Runner update requirements (for a follow-up PR, not this one)

To implement the corrected 4.23/4.24/4.25 metrics, the runner must:

  • compute wte_mean once at runner startup and pass it to the 4.23 probe
  • add sklearn-style LOO NN implementation (or a 20-line numpy equivalent) to the 4.24 probe
  • compute baseline-shifted starter mass in the 4.25 probe
  • emit the Section 4-meta.1 axis-coverage table at the end of report.md

Until that follow-up lands, the v3.45+ audit runs with the old runner will still fail 4.23/4.24/4.25 on the pre-correction metrics. The spec correction is valid independently; it defines the target that the runner update must meet.

Open in Web Open in Cursor 

cursoragent and others added 2 commits April 20, 2026 15:32
- scheme_b_v344.py: v3.42 clone + [J-1] AMS_TRAINED_WEIGHTS env hook
- train_v344.py: CPU training driver (60 steps, 398.5s)
- ckpt/train_log.jsonl + train_stdout.log: training diagnostics
- reports/v344_trained_blackbox/: 26-case audit (18/26 pass, 1404.3s)
- audit_feedback.md: Section 7 compliant analysis

Delta vs v3.42 (untrained 17/26):
  FAIL -> PASS: 4.12 prefix_stepwise_drift_trajectory, 4.21 decode_repetition_feedback_probe
  PASS -> FAIL: 4.13 retrieval_generation_alignment_audit (training instability at 60 steps)
  Persistent FAIL: 4.7, 4.10, 4.15, 4.17, 4.23, 4.24, 4.25

First 26-case run to exceed the 17+/-1 eval-time plateau.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
…1.1); rewrite probes 4.22-4.25 that measured unreachable artifacts; add axis-coverage reporting (7.7, 7.8); retract pre-v3.45 single-probe channel-existence claims

Summary of corrections:
- NEW Section 1.1: precise four-axis definition (A compression / B cost / C fidelity / D stability). Replaces ambiguous 'cipher system' label used since v3.37.
- 1.1.2 explicitly legitimises prefix attention + content_bias + suppression_bias + FS + mean-centered residual as channel mechanisms (not cheats).
- 1.1.3 narrows 'banned' to the actually-banned list (prompt-keyed routing, mocks, corpus-memorised templates, per-probe code paths, stub backbones).
- 4.22 anti-cheating: remove exclusion of hard-masking. Axis = C.
- 4.21 rationale: reframe as D-axis operating-point metric, not 'anti-collapse'.
- 4.23 acceptance: replace unreachable top-3 with mean-centered top-20 + median rank <= 100. Structurally achievable.
- 4.24 acceptance: replace JL-noise-bound cosine-gap with LOO NN accuracy >= 0.75. Statistically powered at N=8.
- 4.25 acceptance: replace saturation-bound top-12 count with continuous starter-mass ratio > 1.10. Unbounded above, monotone in capacity.
- 4-meta rewritten: A/B/C/D axes replace seven-point P0..P3 attribute scheme. Gating downgrades for 4.23/4.24/4.25 until corrected metrics land.
- 4-meta.1 NEW: axis-coverage table required in every v3.45+ report.
- 7.7 NEW: channel-axis framing rules; ban value-judgment use of 'cipher works' language.
- 7.8 NEW: retract pre-v3.45 single-probe-implies-channel-existence statements.

Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants