Skip to content

Add ChaosGrad optimizer and TemporalScheduler for RealNet training#6

Merged
theomgdev merged 22 commits intomainfrom
dev
Mar 17, 2026
Merged

Add ChaosGrad optimizer and TemporalScheduler for RealNet training#6
theomgdev merged 22 commits intomainfrom
dev

Conversation

@theomgdev
Copy link
Copy Markdown
Owner

This pull request refactors the initialization of the RealNetTrainer across multiple proof-of-concept and experiment scripts to standardize optimizer and training configuration. Instead of manually setting the optimizer and loss function, the code now leverages the new ChaosGradConfig class to encapsulate optimizer settings and training parameters, resulting in cleaner and more maintainable experiment scripts.

Trainer initialization and configuration refactor:

  • Replaced manual optimizer setup in all scripts (convergence_gates.py, convergence_identity.py, convergence_mnist.py, convergence_mnist_embed.py, convergence_mnist_record.py, convergence_mnist_revive.py, convergence_mnist_scaled.py, convergence_mnist_tiny.py, convergence_realnet_as_database.py, convergence_sine_wave.py, convergence_adder.py, convergence_detective_thinking.py, convergence_latch.py) with initialization using ChaosGradConfig passed to RealNetTrainer. This eliminates direct optimizer assignment and manual weight decay/loss function setup, promoting consistency and reducing boilerplate. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]
  • Updated imports in all affected files to include ChaosGradConfig (and TemporalSchedulerConfig where relevant), reflecting the new dependency and usage pattern. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
  • Removed redundant optimizer and scheduler assignments, and updated loss function assignment to be handled by the trainer where appropriate, simplifying experiment setup and reducing code duplication. [1] [2] [3] [4] [5] [6] [7]
  • Updated learning rate and weight decay configuration to be encapsulated within the ChaosGradConfig presets (default, aggressive, tiny_network), ensuring consistent and experiment-specific settings. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
  • Updated learning rate reporting and scheduler usage in convergence_mnist_record.py to reference the trainer’s scheduler/optimizer, reflecting the new encapsulation. [1] [2]

Overall, these changes make the experiment scripts easier to maintain, more consistent, and less error-prone by centralizing optimizer and training configuration logic.

…ng engine

- Add ChaosGrad: parameter-group-aware optimizer (chaos_core/projections/lightweight)
  - Gradient centralization, adaptive per-param LR, plateau escape
  - Spectral radius clipping, input gradient sentinel
  - Pre-built configs: default, aggressive, finetune, large_network, tiny_network

- Add TemporalScheduler: adaptive LR scheduler
  - Warmup + cosine decay + loss-aware warm restarts
  - Convergence rate tracking, checkpoint support (state_dict)
  - Pre-built configs: default, llm, short_experiment, finetune, adaptive

- Enhance RealNetTrainer with backward-compatible integration
  - chaos_config/scheduler_config params (opt-in, all defaults preserve legacy)
  - use_chaos_grad=True shortcut for fixed-LR observation mode
  - Diagnostics: get_diagnostics(), get_input_health(), get_spectral_radius()
  - Auto re-init ChaosGrad after neurogenesis (expand)

- Migrate all PoC/experiments to ChaosGrad (scheduler-free, fixed LR)
- Update LIBRARY.md with full documentation
- Update __init__.py exports
… residual_mode parameter to RealNet (none, simple, gated) - none: original behavior (bit-for-bit identical, verified) - simple: pre-norm residual h=h+f(norm(h)) for gradient flow - gated: learnable per-neuron gate alpha*h+(1-alpha)*f(norm(h)) - Extract _inject_input() helper for DRY input injection - Support residual_gate expansion in neurogenesis - Add RESIDUAL_MODE config to experiment_llm.py - Update LIBRARY.md, README.md, PoC_STANDARDS.md docs
…d) - Add residual_mode parameter to RealNet (none, simple, gated) - none: original behavior (bit-for-bit identical, verified) - simple: pre-norm residual h=h+f(norm(h)) for gradient flow - gated: learnable per-neuron gate alpha*h+(1-alpha)*f(norm(h)) - Extract _inject_input() helper for DRY input injection - Support residual_gate expansion in neurogenesis - Add RESIDUAL_MODE config to experiment_llm.py - Update LIBRARY.md, README.md, PoC_STANDARDS.md docs"

This reverts commit 80a0bb4.
Copilot AI review requested due to automatic review settings March 17, 2026 02:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces RealNet-native training components (ChaosGrad optimizer + TemporalScheduler) and refactors RealNetTrainer + various PoC/experiment scripts to use config-driven optimizer/scheduler initialization rather than manual per-script setup.

Changes:

  • Added ChaosGrad/ChaosGradConfig and TemporalScheduler/TemporalSchedulerConfig, and exposed them via realnet.__init__.
  • Refactored RealNetTrainer to auto-select optimizers/schedulers (including re-init behavior after neurogenesis) and added diagnostic helpers.
  • Updated multiple PoC/experiment scripts and library docs to use the new config-based trainer initialization (plus some dataset/hyperparameter changes in the LLM notebook/script).

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
realnet/utils/realstore.py Extends weight transplantation to support re-initializing new regions via an init_new strategy.
realnet/utils/neurogenesis.py Updates neurogenesis weight expansion initialization for new connections.
realnet/training/trainer.py Adds ChaosGrad/TemporalScheduler integration, optimizer/scheduler selection, and diagnostics.
realnet/training/chaos_scheduler.py New adaptive LR scheduler with warmup/cosine decay/restart logic and presets.
realnet/training/chaos_optimizer.py New ChaosGrad optimizer with parameter grouping, adaptive LR, plateau escape, and diagnostics.
realnet/core/network.py Adds new weight init strategies (micro_quiet, micro_quiet_8bit).
realnet/init.py Exports ChaosGrad and TemporalScheduler APIs at package top-level.
realnet/LIBRARY.md Documents ChaosGrad/TemporalScheduler usage via RealNetTrainer and direct usage.
RealNET.ipynb Switches the LLM notebook’s dataset/tokenizer flow to TinyStories and updates naming/paths.
README_TR.md Removes FineWeb-specific wording from an insight bullet.
README.md Removes FineWeb-specific wording from an insight bullet.
PoC/experiments/experiment_llm.py Refactors trainer init to ChaosGrad + trainer-integrated scheduler; also changes dataset + several hyperparameters.
PoC/experiments/convergence_stopwatch.py Switches optimizer setup to ChaosGradConfig passed into the trainer.
PoC/experiments/convergence_sine_wave.py Switches optimizer setup to ChaosGradConfig passed into the trainer.
PoC/experiments/convergence_realnet_as_database.py Switches optimizer setup to ChaosGradConfig passed into the trainer.
PoC/experiments/convergence_mnist_tiny.py Switches optimizer setup to ChaosGradConfig passed into the trainer.
PoC/experiments/convergence_mnist_scaled.py Switches optimizer setup to ChaosGradConfig passed into the trainer.
PoC/experiments/convergence_mnist_revive.py Switches optimizer setup to ChaosGradConfig passed into the trainer.
PoC/experiments/convergence_mnist_record.py Switches optimizer setup to ChaosGradConfig; removes manual scheduler stepping and updates LR reporting source.
PoC/experiments/convergence_mnist_embed.py Switches optimizer setup to ChaosGradConfig passed into the trainer.
PoC/experiments/convergence_latch.py Switches optimizer setup to ChaosGradConfig passed into the trainer.
PoC/experiments/convergence_detective_thinking.py Switches optimizer setup to ChaosGradConfig passed into the trainer.
PoC/experiments/convergence_adder.py Switches optimizer setup to ChaosGradConfig passed into the trainer.
PoC/convergence_mnist.py Switches optimizer setup to ChaosGradConfig passed into the trainer.
PoC/convergence_identity.py Switches optimizer setup to ChaosGradConfig.tiny_network passed into the trainer.
PoC/convergence_gates.py Switches optimizer setup to ChaosGradConfig.tiny_network passed into the trainer.
Comments suppressed due to low confidence (2)

PoC/experiments/experiment_llm.py:18

  • TemporalSchedulerConfig is imported here but never used (scheduler config is constructed as a plain dict). Consider removing the unused import, or switching to TemporalSchedulerConfig.*() presets for consistency with the rest of the refactor.
    PoC/experiments/experiment_llm.py:86
  • This script change switches the tokenizer training dataset from FineWeb to TinyStories (and also adjusts several model/data hyperparameters in this file). That’s a significant behavioral change not described in the PR description, which focuses on optimizer/scheduler refactors. Either update the PR description to include the dataset/hyperparameter change, or move these adjustments into a separate PR to keep the scope clear.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +185 to 188
" print(f\"📚 Training new {k_size}k BPE Tokenizer from data slice...\")\n",
" tokenizer = ByteLevelBPETokenizer()\n",
" dataset_sample = load_dataset(\"HuggingFaceFW/fineweb-edu\", name=\"CC-MAIN-2024-10\", split=\"train\", streaming=True)\n",
" dataset_sample = load_dataset(\"roneneldan/TinyStories\", split=\"train\", streaming=True)\n",
" \n",
def _cosine_lr(self, step):
"""Calculate cosine decay LR multiplier."""
effective_step = step - self._cycle_start_step
effective_max = self.max_steps - self._cycle_start_step
return self.min_lr_ratio

# Cosine decay
decay_ratio = (effective_step - self.warmup_steps) / max(1, effective_max - self.warmup_steps)
theomgdev and others added 12 commits March 17, 2026 05:33
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@theomgdev
Copy link
Copy Markdown
Owner Author

@copilot what do you think now?

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 17, 2026

@theomgdev I've opened a new pull request, #7, to work on those changes. Once the pull request is ready, I'll request review from you.

@theomgdev theomgdev merged commit 3617b41 into main Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants