v3.44-Trained black-box audit: 60-step CPU training breaks 17/26 plateau#17
Draft
FluffyAIcode wants to merge 1 commit intomainfrom
Draft
v3.44-Trained black-box audit: 60-step CPU training breaks 17/26 plateau#17FluffyAIcode wants to merge 1 commit intomainfrom
FluffyAIcode wants to merge 1 commit intomainfrom
Conversation
- scheme_b_v344.py: v3.42 clone + [J-1] AMS_TRAINED_WEIGHTS env hook - train_v344.py: CPU training driver (60 steps, 398.5s) - ckpt/train_log.jsonl + train_stdout.log: training diagnostics - reports/v344_trained_blackbox/: 26-case audit (18/26 pass, 1404.3s) - audit_feedback.md: Section 7 compliant analysis Delta vs v3.42 (untrained 17/26): FAIL -> PASS: 4.12 prefix_stepwise_drift_trajectory, 4.21 decode_repetition_feedback_probe PASS -> FAIL: 4.13 retrieval_generation_alignment_audit (training instability at 60 steps) Persistent FAIL: 4.7, 4.10, 4.15, 4.17, 4.23, 4.24, 4.25 First 26-case run to exceed the 17+/-1 eval-time plateau. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scope
scheme_b_v344.py=scheme_b_v342.py+[J-1]weight-load hook. NoCfgchanges.Trainer.step()on CPU (batch 3, Adam lr=1e-4). Took 398.5 s.AMS_TRAINED_WEIGHTS=ckpt/v344_trained.pt. Took 1404.3 s.Result
18 / 26 pass, first 26-case run to exceed the 17±1 eval-time plateau held across v3.37 → v3.43.
Delta vs v3.42 (untrained, 17/26)
prefix_stepwise_drift_trajectory; 4.21decode_repetition_feedback_proberetrieval_generation_alignment_audit(training instability @ 60 steps — output drifts into Qwen's multilingual token space)Mechanism: why training broke the plateau
vocab_projandrerankerlearned weights.vocab_proj.proj[-1]was zero-init in v3.42 (std = 0); after 60 stepsstd = 7e-4, enough to add ~+1 logit semantic boost to content tokens at step 0, breaking the "key key key" attractor.context_separation(→0 by step 14) indicates that loss is mis-specified (clamps all pairs; see §4.3 in feedback).Hypothesis test: which FAILs are trainable?
Pre-training prediction matrix:
vocab_projstd too low at 60 stepsThe "eval-time vs training-time" partitioning was directionally correct but case-specific assignments were wrong. Learned
vocab_proj/rerankerweights carry more degrees of freedom than anyCfgscalar, which is why training broke cases that scalar tuning could not.Next-step projections (not executed)
context_separation_lossto triplet form → expect 4.24 PASSCfgchangesArtifacts
scheme_b_v344.py+train_v344.pyckpt/v344_trained.pt(453 MB — not tracked, reproducible bypython3 train_v344.py --steps 60)ckpt/train_log.jsonl+ckpt/train_stdout.logreports/v344_trained_blackbox/{report.json, report.md, runner.log, audit_feedback.md}