Phase I MVP: scaffold + CIFAR-10→CIFAR-100 continual pipeline + smoke tests by icedac · Pull Request #2 · labforadvancedstudy/hal-sleep-wake

icedac · 2026-04-18T10:43:56Z

Summary

Phase I MVP scaffolding for the HAL sleep-wake framework on a vision stack (CIFAR-10 → CIFAR-100 continual). Runnable code + smoke tests, no GPU measurement yet (that lands in a follow-up PR).

Vision-stack mapping of the paper's LLM-stack hierarchy (paper §2.3):

Layer	Paper (LLM stack)	This MVP (vision stack)
L1	Context window	Task-batch ring buffer (`hal_sleep_wake/memory.py`)
L2	LoRA weights	LoRA adapter on `Conv2d` + `fc` (`hal_sleep_wake/model.py`)
L3	Small model (7B)	ResNet-18 base weights (`hal_sleep_wake/model.py`)
L4	Large model (70B)	Out of scope — Phase II+

Sleep cycle in this MVP (paper §2.4 minimum subset):

Wake — SGD over LoRA-only params on the current task's DataLoader; every batch is pushed into the L1 buffer.
Sleep (NREM) — at the end of each task, LoRA deltas are merged into the ResNet base weights (W ← W + s · (α/r) · B @ A) via peft's native layer.merge() with a scaling-dict patch for the sleep_scale safety multiplier, then A = Kaiming / B = zeros re-init. Optimizer state wiped.

REM synthetic-dream generation and entropy-aware context compression (paper §2.4 lines 85–101) are explicit follow-up items.

What's in this PR

hal_sleep_wake/ — config.py, memory.py, model.py, wake.py, sleep.py, data.py, metrics.py
scripts/train_cifar10_cifar100.py — entry point with --dry-run (imports no torch), --no-sleep (baseline for ablation), --results-out (JSON dump)
scripts/eval.py — replays a run's JSON and prints the accuracy matrix + avg / forgetting
tests/ — 20 unit + smoke tests, all CPU-only, degrade gracefully when torch is unavailable locally (via pytest.importorskip)
.github/workflows/ci.yml — Python 3.11, CPU-only torch wheel, runs ruff check . + pytest -q
README.md + results/README.md updated with the Phase I status, install/run/test commands, and a results-placeholder table

Test plan

ruff check . — clean
pytest -q in CI with CPU-only torch wheel
python scripts/train_cifar10_cifar100.py --dry-run — prints the plan without downloads
Follow-up PR: real GPU run that fills results/baseline_vs_hal.md with both --no-sleep and HAL rows, targeting the paper §5 goal of 50% forgetting reduction

Closes part of #1. Remaining Phase I work (dataset expansion beyond CIFAR, GPU measurement, REM, entropy-aware compression) tracked in that same issue.

🤖 Generated with Claude Code

…smoke tests Implements the minimum viable subset of the HAL sleep-wake framework (paper §2.4) on a vision stack for Phase I measurement: - L1 task-batch ring buffer (`hal_sleep_wake/memory.py`) - L2 LoRA adapter on ResNet-18 `Conv2d` + `fc` (`hal_sleep_wake/model.py`) - L3 ResNet-18 backbone — fresh init, no pretrained weights - Wake: SGD over LoRA-only params, every batch pushed into L1 (`wake.py`) - Sleep (NREM): merge LoRA deltas into base weights with a safety `sleep_scale` multiplier, re-init A=Kaiming/B=zeros, wipe optimizer state (`sleep.py`) - Continual metrics: accuracy matrix + average accuracy + forgetting rate per paper §5 (`metrics.py`) - CIFAR-10 → CIFAR-100 dry-run-able entry point (`scripts/train_cifar10_cifar100.py`, `--dry-run` imports no torch) - Eval helper (`scripts/eval.py`) for replaying a run's JSON dump Tests (CPU, no dataset download): - `tests/test_memory.py` — 5 tests - `tests/test_metrics.py` — 7 tests - `tests/test_model.py` — 3 tests (importorskip on torch/peft) - `tests/test_smoke.py` — 5 tests including a dry-run subprocess and an end-to-end sleep-merge roundtrip CI (`.github/workflows/ci.yml`): Python 3.11, CPU-only torch wheel index, runs `ruff check .` + `pytest -q`. Scope boundary: Phase I MVP only. REM synthetic-dream generation and entropy-aware context compression (paper §2.4 lines 85–101) land in follow-up PRs. No GPU run numbers yet — CI has no GPU; measurement PR will fill `results/baseline_vs_hal.md`. Closes part of #1. Co-Authored-By: Zhuge <z@2lab.ai>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f2d8db88f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-18T10:47:41Z

+
+def reset_optimizer_state(optimizer: torch.optim.Optimizer) -> None:
+    """Wipe optimizer accumulators (momentum, second moments, ...)."""
+    optimizer.state = {}


Clear optimizer state without changing its mapping type

reset_optimizer_state assigns optimizer.state = {}, which replaces PyTorch's expected state mapping with a plain dict. In the default flow (train_cifar10_cifar100.py with sleep enabled), sleep runs after task 0 and the next optimizer.step() on task 1 will look up per-parameter state via self.state[p]; with a plain dict this raises KeyError, so the main HAL training path crashes before finishing the sequence.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-18T10:47:41Z

+            original = layer.scaling[adapter_name]
+            layer.scaling[adapter_name] = original * float(scale)
+            try:
+                layer.merge(adapter_names=[adapter_name])


Unmerge adapter after consolidation to keep wake updates active

After calling layer.merge(...), the code never unmerges or clears merged adapter state before continuing training. PEFT LoRA layers skip adapter contributions in forward when marked merged, so after the first sleep cycle subsequent wake steps update LoRA tensors that are no longer applied; later sleep cycles also fail to consolidate new deltas. This makes continual learning behavior incorrect after task 0.

Useful? React with 👍 / 👎.

… delta Addresses review blockers on PR #2: B1 (L1 buffer was write-only). `wake_step` now draws `replay_n` samples from the L1 buffer before the forward pass, concatenating them into the current batch. Default `replay_n=0` preserves baseline behavior; the `--replay-n` CLI flag and `HyperParams.replay_n` expose it. Previously the docstring claimed replay was wired but nothing called `buffer.sample`; this removes that lie. B2 (sleep_phase left peft's `merged_adapters` set). After `layer.merge` we now `merged_adapters.clear()` so the next wake step sees an unmerged adapter and peft's forward actually applies the (freshly zeroed) B @ A delta. Without this, training post-sleep was silently a no-op for the LoRA path. New test `test_post_sleep_wake_actually_updates_lora_params` catches regressions. PR-analyzer B1 (sleep-merge test only asserted `merged >= 1`). Replaced with `test_sleep_merge_applies_correct_delta_to_fc_head`: snapshots the fc base weight, computes `scale * (alpha/r) * B @ A` by hand, calls `sleep_phase(scale=0.5)`, asserts `allclose` on the actual delta, that lora_B is zeroed, that optimizer state is cleared, and that `merged_adapters` no longer contains the adapter. Additional cleanup: - `HyperParams`: add `replay_n`, `__post_init__` validation for `num_classes_per_task` length and non-negative `replay_n`; drop the unused `extra` dict. - `sleep.py`: use `optimizer.state.clear()` to preserve the defaultdict factory on recent torch; docstring now states `sleep_scale` is ad-hoc (not from paper §2.4) and only scales the delta. - `model.py`: drop dead `extract_head_state` (no caller); tighten the requires_grad comment. - Tests: `test_buffer_mixes_multiple_task_ids`, `test_forgetting_zero_when_task_improves`, `test_eval_script_rejects_empty_matrix`, `test_config_rejects_mismatched_task_lengths`. Removed empty `@pytest.mark.usefixtures()` decorator cruft. - `README.md`: "Known limitations (Phase I MVP)" section explicit about shared head, ad-hoc sleep_scale, missing downsample.0 LoRA targets, and forgetting formula semantics. Co-Authored-By: Zhuge <z@2lab.ai>

chatgpt-codex-connector Bot reviewed Apr 18, 2026

View reviewed changes

icedac merged commit 55c9214 into main Apr 18, 2026
1 check failed

icedac deleted the feat/mvp-scaffold-cifar-continual branch April 18, 2026 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phase I MVP: scaffold + CIFAR-10→CIFAR-100 continual pipeline + smoke tests#2

Phase I MVP: scaffold + CIFAR-10→CIFAR-100 continual pipeline + smoke tests#2
icedac merged 2 commits into
mainfrom
feat/mvp-scaffold-cifar-continual

icedac commented Apr 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

icedac commented Apr 18, 2026

Summary

What's in this PR

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant