Phase I MVP: scaffold + CIFAR-10→CIFAR-100 continual pipeline + smoke tests#2
Conversation
…smoke tests Implements the minimum viable subset of the HAL sleep-wake framework (paper §2.4) on a vision stack for Phase I measurement: - L1 task-batch ring buffer (`hal_sleep_wake/memory.py`) - L2 LoRA adapter on ResNet-18 `Conv2d` + `fc` (`hal_sleep_wake/model.py`) - L3 ResNet-18 backbone — fresh init, no pretrained weights - Wake: SGD over LoRA-only params, every batch pushed into L1 (`wake.py`) - Sleep (NREM): merge LoRA deltas into base weights with a safety `sleep_scale` multiplier, re-init A=Kaiming/B=zeros, wipe optimizer state (`sleep.py`) - Continual metrics: accuracy matrix + average accuracy + forgetting rate per paper §5 (`metrics.py`) - CIFAR-10 → CIFAR-100 dry-run-able entry point (`scripts/train_cifar10_cifar100.py`, `--dry-run` imports no torch) - Eval helper (`scripts/eval.py`) for replaying a run's JSON dump Tests (CPU, no dataset download): - `tests/test_memory.py` — 5 tests - `tests/test_metrics.py` — 7 tests - `tests/test_model.py` — 3 tests (importorskip on torch/peft) - `tests/test_smoke.py` — 5 tests including a dry-run subprocess and an end-to-end sleep-merge roundtrip CI (`.github/workflows/ci.yml`): Python 3.11, CPU-only torch wheel index, runs `ruff check .` + `pytest -q`. Scope boundary: Phase I MVP only. REM synthetic-dream generation and entropy-aware context compression (paper §2.4 lines 85–101) land in follow-up PRs. No GPU run numbers yet — CI has no GPU; measurement PR will fill `results/baseline_vs_hal.md`. Closes part of #1. Co-Authored-By: Zhuge <z@2lab.ai>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4f2d8db88f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| def reset_optimizer_state(optimizer: torch.optim.Optimizer) -> None: | ||
| """Wipe optimizer accumulators (momentum, second moments, ...).""" | ||
| optimizer.state = {} |
There was a problem hiding this comment.
Clear optimizer state without changing its mapping type
reset_optimizer_state assigns optimizer.state = {}, which replaces PyTorch's expected state mapping with a plain dict. In the default flow (train_cifar10_cifar100.py with sleep enabled), sleep runs after task 0 and the next optimizer.step() on task 1 will look up per-parameter state via self.state[p]; with a plain dict this raises KeyError, so the main HAL training path crashes before finishing the sequence.
Useful? React with 👍 / 👎.
| original = layer.scaling[adapter_name] | ||
| layer.scaling[adapter_name] = original * float(scale) | ||
| try: | ||
| layer.merge(adapter_names=[adapter_name]) |
There was a problem hiding this comment.
Unmerge adapter after consolidation to keep wake updates active
After calling layer.merge(...), the code never unmerges or clears merged adapter state before continuing training. PEFT LoRA layers skip adapter contributions in forward when marked merged, so after the first sleep cycle subsequent wake steps update LoRA tensors that are no longer applied; later sleep cycles also fail to consolidate new deltas. This makes continual learning behavior incorrect after task 0.
Useful? React with 👍 / 👎.
… delta Addresses review blockers on PR #2: B1 (L1 buffer was write-only). `wake_step` now draws `replay_n` samples from the L1 buffer before the forward pass, concatenating them into the current batch. Default `replay_n=0` preserves baseline behavior; the `--replay-n` CLI flag and `HyperParams.replay_n` expose it. Previously the docstring claimed replay was wired but nothing called `buffer.sample`; this removes that lie. B2 (sleep_phase left peft's `merged_adapters` set). After `layer.merge` we now `merged_adapters.clear()` so the next wake step sees an unmerged adapter and peft's forward actually applies the (freshly zeroed) B @ A delta. Without this, training post-sleep was silently a no-op for the LoRA path. New test `test_post_sleep_wake_actually_updates_lora_params` catches regressions. PR-analyzer B1 (sleep-merge test only asserted `merged >= 1`). Replaced with `test_sleep_merge_applies_correct_delta_to_fc_head`: snapshots the fc base weight, computes `scale * (alpha/r) * B @ A` by hand, calls `sleep_phase(scale=0.5)`, asserts `allclose` on the actual delta, that lora_B is zeroed, that optimizer state is cleared, and that `merged_adapters` no longer contains the adapter. Additional cleanup: - `HyperParams`: add `replay_n`, `__post_init__` validation for `num_classes_per_task` length and non-negative `replay_n`; drop the unused `extra` dict. - `sleep.py`: use `optimizer.state.clear()` to preserve the defaultdict factory on recent torch; docstring now states `sleep_scale` is ad-hoc (not from paper §2.4) and only scales the delta. - `model.py`: drop dead `extract_head_state` (no caller); tighten the requires_grad comment. - Tests: `test_buffer_mixes_multiple_task_ids`, `test_forgetting_zero_when_task_improves`, `test_eval_script_rejects_empty_matrix`, `test_config_rejects_mismatched_task_lengths`. Removed empty `@pytest.mark.usefixtures()` decorator cruft. - `README.md`: "Known limitations (Phase I MVP)" section explicit about shared head, ad-hoc sleep_scale, missing downsample.0 LoRA targets, and forgetting formula semantics. Co-Authored-By: Zhuge <z@2lab.ai>
Summary
Phase I MVP scaffolding for the HAL sleep-wake framework on a vision stack (CIFAR-10 → CIFAR-100 continual). Runnable code + smoke tests, no GPU measurement yet (that lands in a follow-up PR).
Vision-stack mapping of the paper's LLM-stack hierarchy (paper §2.3):
hal_sleep_wake/memory.py)Conv2d+fc(hal_sleep_wake/model.py)hal_sleep_wake/model.py)Sleep cycle in this MVP (paper §2.4 minimum subset):
DataLoader; every batch is pushed into the L1 buffer.W ← W + s · (α/r) · B @ A) via peft's nativelayer.merge()with a scaling-dict patch for thesleep_scalesafety multiplier, then A = Kaiming / B = zeros re-init. Optimizer state wiped.REM synthetic-dream generation and entropy-aware context compression (paper §2.4 lines 85–101) are explicit follow-up items.
What's in this PR
hal_sleep_wake/—config.py,memory.py,model.py,wake.py,sleep.py,data.py,metrics.pyscripts/train_cifar10_cifar100.py— entry point with--dry-run(imports no torch),--no-sleep(baseline for ablation),--results-out(JSON dump)scripts/eval.py— replays a run's JSON and prints the accuracy matrix + avg / forgettingtests/— 20 unit + smoke tests, all CPU-only, degrade gracefully when torch is unavailable locally (viapytest.importorskip).github/workflows/ci.yml— Python 3.11, CPU-only torch wheel, runsruff check .+pytest -qREADME.md+results/README.mdupdated with the Phase I status, install/run/test commands, and a results-placeholder tableTest plan
ruff check .— cleanpytest -qin CI with CPU-only torch wheelpython scripts/train_cifar10_cifar100.py --dry-run— prints the plan without downloadsresults/baseline_vs_hal.mdwith both--no-sleepand HAL rows, targeting the paper §5 goal of 50% forgetting reductionCloses part of #1. Remaining Phase I work (dataset expansion beyond CIFAR, GPU measurement, REM, entropy-aware compression) tracked in that same issue.
🤖 Generated with Claude Code