- Core Feature: Stores experience transitions (
Transition) with metadata foruncertaintyandtd_error. - Pruning Logic: Removes "consolidated" memories (low uncertainty AND low TD-error) to prioritize novel/unlearned experiences.
- Uncertainty Estimation: Uses Fisher-informed weight perturbation: by testing robustness to changes in low-Fisher weights (unimportant connections), it measures consolidation. Robustness = low uncertainty.
- Batch Updates: Implements batch-efficient updates for uncertainty and TD-errors.
- Core Feature: Computes and stores the diagonal of the Fisher Information Matrix (
fisher_diagonal) andoptimal_weights. - Function: Quantifies the importance of each parameter for past learning.
- Mechanisms:
compute_fisher(): Calculates squared gradients using a sampled dataloader from the replay buffer.decay_fisher(): Multiplicative decay (decay_rate=0.77) to implement "Fisher saturation," allowing old memories to become less protected.get_ewc_loss(): Provides the EWC penalty:λ/2 * Σ F_i * (θ_i - θ*_i)².
- Core Feature: Wraps the task loss, adding the EWC penalty to prevent catastrophic forgetting.
- Loss Equation:
Total Loss = Task Loss + λ * EWC Penalty. - Hyperparameter:
lambda_ewc: 155000.0(set in default config for extreme protection).
- Orchestration: Manages the full continual learning cycle (PPO-based policy learning).
- Wake Phase: Agent interacts with the environment, collects data into a PPO memory buffer and the long-term
ReplayBuffer. - Sleep Phase (
sleep_cycle):- PPO Update: Policy and Value functions are updated using PPO with GAE.
- Consolidation Replay: Additional replay of samples from the buffer.
- Fisher Update: Computes the new Fisher Information Matrix.
- Homeostatic Scaling: Applies synaptic scaling to the CfC brain (REM-like).
- Epistemic Pruning: Updates uncertainties/TD-errors and removes consolidated memories.
- Optimal Weights Update: Stores current weights as new
optimal_weightsfor EWC.
- Action Exploration: Implements annealing of action standard deviation.
- Checkpointing: Full state save/load capabilities for the agent, world, and memory systems.
tests/unit/test_replay_buffer.py (4 tests)
✓ test_replay_buffer_init
✓ test_push_and_sample
✓ test_circular_buffer
✓ test_pruning
tests/unit/test_fisher_matrix.py (3 tests)
✓ test_fisher_initialization
✓ test_fisher_decay
✓ test_ewc_loss
scripts/test_phase4.py (10 integration tests)
✓ Test 1: Replay Buffer working
✓ Test 2: Fisher matrix working
✓ Test 3: EWC loss working
✓ Test 4: Sleep-wake trainer working
✓ Test 5: Checkpointing verified
✓ Test 6: Multi-cycle training
✓ Test 7: Uncertainty Variation Check
✓ Test 8: Pruning Effectiveness Check
✓ Test 9: Fisher Accumulation Check
✓ Test 10: Consolidation Progress Check
The core architecture for embodied continual learning is now fully operational. The system is ready for long-horizon experiments to validate the research goals:
- Validate Fisher Information saturation at ~20 days.
- Demonstrate stable learning across 3+ simulated years.
- Measure context integration (φ_seasonal) and test for consciousness prerequisites.