Skip to content

Latest commit

 

History

History
70 lines (59 loc) · 3.79 KB

File metadata and controls

70 lines (59 loc) · 3.79 KB

Phase 4: Memory and Training - COMPLETE ✅

Components Implemented

1. Replay Buffer with Epistemic Pruning (soliter/memory/replay_buffer.py)

  • Core Feature: Stores experience transitions (Transition) with metadata for uncertainty and td_error.
  • Pruning Logic: Removes "consolidated" memories (low uncertainty AND low TD-error) to prioritize novel/unlearned experiences.
  • Uncertainty Estimation: Uses Fisher-informed weight perturbation: by testing robustness to changes in low-Fisher weights (unimportant connections), it measures consolidation. Robustness = low uncertainty.
  • Batch Updates: Implements batch-efficient updates for uncertainty and TD-errors.

2. Fisher Information Matrix (soliter/memory/fisher_matrix.py)

  • Core Feature: Computes and stores the diagonal of the Fisher Information Matrix (fisher_diagonal) and optimal_weights.
  • Function: Quantifies the importance of each parameter for past learning.
  • Mechanisms:
    • compute_fisher(): Calculates squared gradients using a sampled dataloader from the replay buffer.
    • decay_fisher(): Multiplicative decay (decay_rate=0.77) to implement "Fisher saturation," allowing old memories to become less protected.
    • get_ewc_loss(): Provides the EWC penalty: λ/2 * Σ F_i * (θ_i - θ*_i)².

3. Elastic Weight Consolidation (EWC) Loss (soliter/training/ewc.py)

  • Core Feature: Wraps the task loss, adding the EWC penalty to prevent catastrophic forgetting.
  • Loss Equation: Total Loss = Task Loss + λ * EWC Penalty.
  • Hyperparameter: lambda_ewc: 155000.0 (set in default config for extreme protection).

4. Sleep-Wake Trainer (soliter/training/sleep_wake.py)

  • Orchestration: Manages the full continual learning cycle (PPO-based policy learning).
  • Wake Phase: Agent interacts with the environment, collects data into a PPO memory buffer and the long-term ReplayBuffer.
  • Sleep Phase (sleep_cycle):
    1. PPO Update: Policy and Value functions are updated using PPO with GAE.
    2. Consolidation Replay: Additional replay of samples from the buffer.
    3. Fisher Update: Computes the new Fisher Information Matrix.
    4. Homeostatic Scaling: Applies synaptic scaling to the CfC brain (REM-like).
    5. Epistemic Pruning: Updates uncertainties/TD-errors and removes consolidated memories.
    6. Optimal Weights Update: Stores current weights as new optimal_weights for EWC.
  • Action Exploration: Implements annealing of action standard deviation.
  • Checkpointing: Full state save/load capabilities for the agent, world, and memory systems.

Test Results: 64/64 Passing (Phase 2, 3, and unit tests for 4)

Phase 4 Unit Tests

tests/unit/test_replay_buffer.py (4 tests)
  ✓ test_replay_buffer_init
  ✓ test_push_and_sample
  ✓ test_circular_buffer
  ✓ test_pruning

tests/unit/test_fisher_matrix.py (3 tests)
  ✓ test_fisher_initialization
  ✓ test_fisher_decay
  ✓ test_ewc_loss

scripts/test_phase4.py (10 integration tests)
  ✓ Test 1: Replay Buffer working
  ✓ Test 2: Fisher matrix working
  ✓ Test 3: EWC loss working
  ✓ Test 4: Sleep-wake trainer working
  ✓ Test 5: Checkpointing verified
  ✓ Test 6: Multi-cycle training
  ✓ Test 7: Uncertainty Variation Check
  ✓ Test 8: Pruning Effectiveness Check
  ✓ Test 9: Fisher Accumulation Check
  ✓ Test 10: Consolidation Progress Check

Next: Final Experiments and Evaluation

The core architecture for embodied continual learning is now fully operational. The system is ready for long-horizon experiments to validate the research goals:

  1. Validate Fisher Information saturation at ~20 days.
  2. Demonstrate stable learning across 3+ simulated years.
  3. Measure context integration (φ_seasonal) and test for consciousness prerequisites.