Reference implementation anchor for:
"Hierarchical Abstraction is All You Need: A Mathematical Framework for Continuous AI Consciousness Through Sleep-Wake Cycles" Jihyuk Im, Claude Opus 4. June 2025.
Phase I MVP scaffolding. Code runs, tests pass, benchmark numbers have not been measured yet (CI has no GPU). See results/README.md.
Vision-stack mapping of the paper's LLM-stack hierarchy (paper §2.3):
| Layer | Paper (LLM stack) | This MVP (vision stack) |
|---|---|---|
| L1 | Context window | Task-batch ring buffer (hal_sleep_wake/memory.py) |
| L2 | LoRA weights | LoRA adapter on Conv2d + fc (hal_sleep_wake/model.py) |
| L3 | Small model (7B) | ResNet-18 base weights (hal_sleep_wake/model.py) |
| L4 | Large model (70B) | Out of scope — Phase II+ |
Task sequence (subset of paper §3.1): CIFAR-10 → CIFAR-100. Remaining datasets land in follow-up PRs.
pip install -e '.[dev]'torch + torchvision are heavy dependencies; for CPU-only use the PyTorch CPU wheel index:
pip install --index-url https://download.pytorch.org/whl/cpu torch torchvision
pip install -e '.[dev]'Dry-run (prints the planned loop without downloading data or building a model):
python scripts/train_cifar10_cifar100.py --dry-runReal training (needs a GPU + dataset download):
python scripts/train_cifar10_cifar100.py \
--data-root ./data --download --device cuda \
--results-out results/runs/hal.json
python scripts/eval.py results/runs/hal.json--no-sleep disables the consolidation phase (baseline for the ablation row). --replay-n N draws N samples from the L1 buffer into each wake step (0 = off, the default). Both knobs share the same code path so baseline vs HAL runs are directly comparable.
Flagged explicitly so the follow-up PRs target the right gaps, and so the eventual measurement numbers aren't read as something they're not:
- Shared classifier head. The model uses
max(num_classes_per_task) = 100logits for both tasks. CIFAR-10 trains against the first 10 positions; the remaining 90 logits are not masked during eval. Per-task heads land in a follow-up PR. sleep_scaleis ad-hoc. Paper §2.4 does not define this scalar. It's an ablation knob only; leaving it at1.0reproduces the paper's unscaledW' = W + (α/r) · B @ A.- LoRA targets omit residual projections.
("conv1", "conv2", "fc")misses ResNet-18'sdownsample.01×1 convs on the residual path. Documented choice, not a bug. - Forgetting rate formula. Uses
max_{j ≤ T-1} A[i,j] − A[i,T-1](includes the final round in the max). Stays non-negative even if a task improves on its last round; seetests/test_metrics.py::test_forgetting_zero_when_task_improves. - No GPU numbers yet. CI has no GPU;
results/baseline_vs_hal.mdis unpopulated until a follow-up measurement PR.
ruff check .
pytest -qSmoke tests run on CPU and do not download datasets. CI (.github/workflows/ci.yml) runs exactly these two commands.
Paper §2.4 gives the full pseudocode (wake → REM → NREM). This MVP implements the minimum viable subset:
- Wake (
hal_sleep_wake/wake.py) — SGD over LoRA-only params on the current task'sDataLoader, pushing every batch into the L1 buffer. - Sleep — NREM consolidation (
hal_sleep_wake/sleep.py) — at the end of each task, LoRA deltas are merged into the ResNet base weights (W ← W + s · (α/r) · B @ A) and then reset to a fresh Kaiming / zero init. Optimizer state is wiped.
REM synthetic-dream generation and entropy-aware context compression (paper §2.4 lines 85–101) are explicit follow-up items.
Per paper §5 Implementation Roadmap: "50% forgetting reduction on the CIFAR sequence vs a vanilla baseline." This repository will report the raw numbers in results/baseline_vs_hal.md as soon as a GPU run is completed.
TBD (will track the companion paper repository once that is finalized).