v3.40 black-box audit#13
Draft
FluffyAIcode wants to merge 2 commits intomainfrom
Draft
Conversation
v3.40 [F-1..F-7]:
[F-1] prepare_decode_context / generate default update_stats=False.
Memory is immutable during inference; save -> generate -> load ->
generate is a pure function of (mem_state, prompt, rng).
[F-2] AMM._preserve_min_keep applied at every retrieval filter stage
(strict_overlap, upstream, hard, score, coherence, bidi_gap,
mean_center). Cfg.retrieval_min_keep_for_rerank=5. Cfg.mc_min_keep
1 -> 3. RetrievalDiag.min_keep_enforcements counts invocations.
[F-3] MemLLM.fwd adds pure_function_mask penalty when guidance is active.
Cfg.use_fwd_function_suppression, fwd_function_suppression_scale=5.0,
fwd_function_suppression_decay=0.04, fwd_function_suppression_floor=0.3.
Independent of shape_step_logits [E-3] so audit probes that sample
fwd output directly observe the margin shift.
[F-4] _compute_rare_keyword_wte_residual uses target_scale = sqrt(d_LLM)
matching post-LN slot magnitude. Residual magnitude now coherent
with slot_head output instead of target_std * sqrt(d_LLM) which was
order-of-magnitude larger on average.
[F-5] MemoryContextEncoder: Linear -> LN -> SiLU -> Linear -> LN -> SiLU
-> Linear. Orthogonal init on all 3 Linears. encode() applies
per-sample mean-centering before L2-normalize to remove the
constant-bias drift that pulled v3.39 descriptors toward one axis.
[F-6] effective_tail_slots = base + (L_mem - 8) // 2. keyword_tail_top_k
8. Slot s in [1, n_slots-1] receives the (s-1)-th rare keyword
centroid as residual, so tail slots anchor to distinct content
directions instead of sharing one.
[F-7] fwd_path_bias_dampen 0.3 -> 0.25; wte_residual_alpha 0.6 -> 0.5.
Reduces aggregate shaping strength applied at high-retrieval
queries (targets the 4.14 correlation regression from v3.39).
MemEntry fields and MemLLM.save_memory/load_memory preserve context_descriptor.
DecodeContext.mixture_gate / memory_logit_bias present; Cfg.use_mixture_decoding
remains False by default (set to True by probe 4.26).
All prior [C-*]/[D-*]/[E-*] fixes preserved. No mocks, no fallbacks.
Audit runner v331_blackbox_eval.py unchanged on this branch.
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
Artifacts: report.json, report.md, runner.log. Feedback file follows V331_BLACKBOX_TEST_SPEC.md Section 7: run parameters, 26-row per-case table, count summary (pass=16, fail=10, ni=0, error=0, blocking=8), delta vs v3.39 (3 state changes), per-failing-case evidence for all 10 fails with measured metric, threshold, and gap, 6 falsifiable mechanism notes (H1-H6), artifact links. No celebratory / consolation / hype / emotive language. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. Run parameters
scheme_b_v340.pyv331_blackbox_eval.py(unchanged)Qwen/Qwen2.5-1.5B-Instruct(bf16)2. Count summary
3. Delta vs. v3.39
4. Cross-version pass counts (original 4.1 – 4.19)
5. Failing-case evidence (measured)
space_margin > 0> 00.0first_bad_step >= 3>= 30; row 1:4−3output_a == output_bdelta ≥ 1.50.3333mean_intersection ≥ 1.00.0intra − inter ≥ 0.15(both)0.0909, space0.0290starters_B ≥ starters_A + 1B ≥ 4(A=3)26. Full report
reports/v340_blackbox/audit_feedback.md(Section 7 compliant)reports/v340_blackbox/report.jsonreports/v340_blackbox/report.mdreports/v340_blackbox/runner.log7. Compliance note
This description and
audit_feedback.mdconform toV331_BLACKBOX_TEST_SPEC.mdSection 7. Mechanism notes H1–H6 are marked non-normative and stated as falsifiable predictions tied to named code elements.