Skip to content

v3.42 black-box audit: H-1..H-5 convergence fixes for 9 v3.41 failures#15

Draft
FluffyAIcode wants to merge 2 commits intomainfrom
AgentMemory/v342-blackbox-audit-7e97
Draft

v3.42 black-box audit: H-1..H-5 convergence fixes for 9 v3.41 failures#15
FluffyAIcode wants to merge 2 commits intomainfrom
AgentMemory/v342-blackbox-audit-7e97

Conversation

@FluffyAIcode
Copy link
Copy Markdown
Owner

@FluffyAIcode FluffyAIcode commented Apr 20, 2026

Scope

  • SUT: scheme_b_v342.py via AgentMemorySystem.py redirect.
  • Runner: v331_blackbox_eval.py, unmodified.
  • Spec: V331_BLACKBOX_TEST_SPEC.md, unmodified.

v3.42 architectural changes

Tag Target cases Mechanism
H-1 4.7 / 4.8 / 4.10 / 4.15 / 4.17 / 4.21 content_bias / suppression_bias applied only in fwd path; shape_step_applies_*_bias=False; relevance_floor=0.30, concentration=1.5, cyclic_content_max_count=3
H-2 4.23 ContentSemanticTailHead.zero_init_tied=True; wte_residual_alpha=1.5
H-3 4.24 MemoryContextEncoder = single orthogonal Linear, no LN/SiLU
H-4 4.17 .detach().cpu().clone().contiguous() both sides of save/load; (-idf, id) tie-break
H-5 4.25 Carry-over from H-1 (A not saturated) + H-2 (slot 2+ residuals differentiated)

Audit result

  • 26 cases, elapsed 1418.4 s on CPU.
  • Pass: 17 / 26, Fail: 9 / 26.
  • v3.41 baseline: 17 / 26 pass, 1437.7 s.

Delta vs v3.41:

Transition Count Cases
FAIL → PASS 1 4.8
PASS → FAIL 1 4.12
Persistent FAIL 8 4.7, 4.10, 4.15, 4.17, 4.21, 4.23, 4.24, 4.25

Net pass count unchanged. 4.8 fixed by H-1 (double-add removal). 4.12 regressed because H-2's α=1.5 residual at native WTE scale forms an attractor basin at step 0 when prompt tokens overlap with memory rare keywords — the exact trade-off documented in v3.41 §5.5 falsifiable experiment D1.

Three mutually-opposed constraints on wte_residual_alpha (4.23 wants ≥1.5, 4.12 wants ≤0.5, 4.25 wants clamp-compatible scale). No scalar α satisfies all three simultaneously on this corpus.

Other structural observations:

  • 4.23 succeeds structurally (tail slot direction now deterministic = rare-keyword centroid) but top-3 intersection criterion not met.
  • 4.24 now passes space gap (0.152 ≥ 0.15) but music gap went negative (−0.084) — small-corpus sample variance.
  • 4.25 first condition unchanged (A saturated); second condition now fails (slot_norm_ratio=0.784 < 0.85) due to H-2's larger residual consuming clamp budget.
  • 4.17 unchanged — non-determinism is deeper than save/load serialization.

Artifacts

  • reports/v342_blackbox/report.json
  • reports/v342_blackbox/report.md
  • reports/v342_blackbox/runner.log
  • reports/v342_blackbox/audit_feedback.md (Section 7 compliant)
Open in Web Open in Cursor 

cursoragent and others added 2 commits April 20, 2026 12:08
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants