Skip to content

Phase I MVP: scaffold + CIFAR-10→CIFAR-100 continual pipeline + smoke tests#2

Merged
icedac merged 2 commits into
mainfrom
feat/mvp-scaffold-cifar-continual
Apr 18, 2026
Merged

Phase I MVP: scaffold + CIFAR-10→CIFAR-100 continual pipeline + smoke tests#2
icedac merged 2 commits into
mainfrom
feat/mvp-scaffold-cifar-continual

Conversation

@icedac

@icedac icedac commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Phase I MVP scaffolding for the HAL sleep-wake framework on a vision stack (CIFAR-10 → CIFAR-100 continual). Runnable code + smoke tests, no GPU measurement yet (that lands in a follow-up PR).

Vision-stack mapping of the paper's LLM-stack hierarchy (paper §2.3):

Layer Paper (LLM stack) This MVP (vision stack)
L1 Context window Task-batch ring buffer (hal_sleep_wake/memory.py)
L2 LoRA weights LoRA adapter on Conv2d + fc (hal_sleep_wake/model.py)
L3 Small model (7B) ResNet-18 base weights (hal_sleep_wake/model.py)
L4 Large model (70B) Out of scope — Phase II+

Sleep cycle in this MVP (paper §2.4 minimum subset):

  1. Wake — SGD over LoRA-only params on the current task's DataLoader; every batch is pushed into the L1 buffer.
  2. Sleep (NREM) — at the end of each task, LoRA deltas are merged into the ResNet base weights (W ← W + s · (α/r) · B @ A) via peft's native layer.merge() with a scaling-dict patch for the sleep_scale safety multiplier, then A = Kaiming / B = zeros re-init. Optimizer state wiped.

REM synthetic-dream generation and entropy-aware context compression (paper §2.4 lines 85–101) are explicit follow-up items.

What's in this PR

  • hal_sleep_wake/config.py, memory.py, model.py, wake.py, sleep.py, data.py, metrics.py
  • scripts/train_cifar10_cifar100.py — entry point with --dry-run (imports no torch), --no-sleep (baseline for ablation), --results-out (JSON dump)
  • scripts/eval.py — replays a run's JSON and prints the accuracy matrix + avg / forgetting
  • tests/ — 20 unit + smoke tests, all CPU-only, degrade gracefully when torch is unavailable locally (via pytest.importorskip)
  • .github/workflows/ci.yml — Python 3.11, CPU-only torch wheel, runs ruff check . + pytest -q
  • README.md + results/README.md updated with the Phase I status, install/run/test commands, and a results-placeholder table

Test plan

  • ruff check . — clean
  • pytest -q in CI with CPU-only torch wheel
  • python scripts/train_cifar10_cifar100.py --dry-run — prints the plan without downloads
  • Follow-up PR: real GPU run that fills results/baseline_vs_hal.md with both --no-sleep and HAL rows, targeting the paper §5 goal of 50% forgetting reduction

Closes part of #1. Remaining Phase I work (dataset expansion beyond CIFAR, GPU measurement, REM, entropy-aware compression) tracked in that same issue.

🤖 Generated with Claude Code

…smoke tests

Implements the minimum viable subset of the HAL sleep-wake framework
(paper §2.4) on a vision stack for Phase I measurement:

- L1 task-batch ring buffer (`hal_sleep_wake/memory.py`)
- L2 LoRA adapter on ResNet-18 `Conv2d` + `fc` (`hal_sleep_wake/model.py`)
- L3 ResNet-18 backbone — fresh init, no pretrained weights
- Wake: SGD over LoRA-only params, every batch pushed into L1 (`wake.py`)
- Sleep (NREM): merge LoRA deltas into base weights with a safety
  `sleep_scale` multiplier, re-init A=Kaiming/B=zeros, wipe optimizer
  state (`sleep.py`)
- Continual metrics: accuracy matrix + average accuracy + forgetting
  rate per paper §5 (`metrics.py`)
- CIFAR-10 → CIFAR-100 dry-run-able entry point
  (`scripts/train_cifar10_cifar100.py`, `--dry-run` imports no torch)
- Eval helper (`scripts/eval.py`) for replaying a run's JSON dump

Tests (CPU, no dataset download):
- `tests/test_memory.py`         — 5 tests
- `tests/test_metrics.py`        — 7 tests
- `tests/test_model.py`          — 3 tests (importorskip on torch/peft)
- `tests/test_smoke.py`          — 5 tests including a dry-run
  subprocess and an end-to-end sleep-merge roundtrip

CI (`.github/workflows/ci.yml`): Python 3.11, CPU-only torch wheel
index, runs `ruff check .` + `pytest -q`.

Scope boundary: Phase I MVP only. REM synthetic-dream generation and
entropy-aware context compression (paper §2.4 lines 85–101) land in
follow-up PRs. No GPU run numbers yet — CI has no GPU; measurement PR
will fill `results/baseline_vs_hal.md`.

Closes part of #1.

Co-Authored-By: Zhuge <z@2lab.ai>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f2d8db88f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hal_sleep_wake/sleep.py Outdated

def reset_optimizer_state(optimizer: torch.optim.Optimizer) -> None:
"""Wipe optimizer accumulators (momentum, second moments, ...)."""
optimizer.state = {}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Clear optimizer state without changing its mapping type

reset_optimizer_state assigns optimizer.state = {}, which replaces PyTorch's expected state mapping with a plain dict. In the default flow (train_cifar10_cifar100.py with sleep enabled), sleep runs after task 0 and the next optimizer.step() on task 1 will look up per-parameter state via self.state[p]; with a plain dict this raises KeyError, so the main HAL training path crashes before finishing the sequence.

Useful? React with 👍 / 👎.

Comment thread hal_sleep_wake/sleep.py Outdated
original = layer.scaling[adapter_name]
layer.scaling[adapter_name] = original * float(scale)
try:
layer.merge(adapter_names=[adapter_name])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Unmerge adapter after consolidation to keep wake updates active

After calling layer.merge(...), the code never unmerges or clears merged adapter state before continuing training. PEFT LoRA layers skip adapter contributions in forward when marked merged, so after the first sleep cycle subsequent wake steps update LoRA tensors that are no longer applied; later sleep cycles also fail to consolidate new deltas. This makes continual learning behavior incorrect after task 0.

Useful? React with 👍 / 👎.

… delta

Addresses review blockers on PR #2:

B1 (L1 buffer was write-only). `wake_step` now draws `replay_n` samples
from the L1 buffer before the forward pass, concatenating them into the
current batch. Default `replay_n=0` preserves baseline behavior; the
`--replay-n` CLI flag and `HyperParams.replay_n` expose it. Previously
the docstring claimed replay was wired but nothing called
`buffer.sample`; this removes that lie.

B2 (sleep_phase left peft's `merged_adapters` set). After `layer.merge`
we now `merged_adapters.clear()` so the next wake step sees an unmerged
adapter and peft's forward actually applies the (freshly zeroed) B @ A
delta. Without this, training post-sleep was silently a no-op for the
LoRA path. New test `test_post_sleep_wake_actually_updates_lora_params`
catches regressions.

PR-analyzer B1 (sleep-merge test only asserted `merged >= 1`). Replaced
with `test_sleep_merge_applies_correct_delta_to_fc_head`: snapshots the
fc base weight, computes `scale * (alpha/r) * B @ A` by hand, calls
`sleep_phase(scale=0.5)`, asserts `allclose` on the actual delta, that
lora_B is zeroed, that optimizer state is cleared, and that
`merged_adapters` no longer contains the adapter.

Additional cleanup:
- `HyperParams`: add `replay_n`, `__post_init__` validation for
  `num_classes_per_task` length and non-negative `replay_n`; drop the
  unused `extra` dict.
- `sleep.py`: use `optimizer.state.clear()` to preserve the defaultdict
  factory on recent torch; docstring now states `sleep_scale` is ad-hoc
  (not from paper §2.4) and only scales the delta.
- `model.py`: drop dead `extract_head_state` (no caller); tighten the
  requires_grad comment.
- Tests: `test_buffer_mixes_multiple_task_ids`,
  `test_forgetting_zero_when_task_improves`,
  `test_eval_script_rejects_empty_matrix`,
  `test_config_rejects_mismatched_task_lengths`. Removed empty
  `@pytest.mark.usefixtures()` decorator cruft.
- `README.md`: "Known limitations (Phase I MVP)" section explicit about
  shared head, ad-hoc sleep_scale, missing downsample.0 LoRA targets,
  and forgetting formula semantics.

Co-Authored-By: Zhuge <z@2lab.ai>
@icedac icedac merged commit 55c9214 into main Apr 18, 2026
1 check failed
@icedac icedac deleted the feat/mvp-scaffold-cifar-continual branch April 18, 2026 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant