Conversation
…ng engine - Add ChaosGrad: parameter-group-aware optimizer (chaos_core/projections/lightweight) - Gradient centralization, adaptive per-param LR, plateau escape - Spectral radius clipping, input gradient sentinel - Pre-built configs: default, aggressive, finetune, large_network, tiny_network - Add TemporalScheduler: adaptive LR scheduler - Warmup + cosine decay + loss-aware warm restarts - Convergence rate tracking, checkpoint support (state_dict) - Pre-built configs: default, llm, short_experiment, finetune, adaptive - Enhance RealNetTrainer with backward-compatible integration - chaos_config/scheduler_config params (opt-in, all defaults preserve legacy) - use_chaos_grad=True shortcut for fixed-LR observation mode - Diagnostics: get_diagnostics(), get_input_health(), get_spectral_radius() - Auto re-init ChaosGrad after neurogenesis (expand) - Migrate all PoC/experiments to ChaosGrad (scheduler-free, fixed LR) - Update LIBRARY.md with full documentation - Update __init__.py exports
… residual_mode parameter to RealNet (none, simple, gated) - none: original behavior (bit-for-bit identical, verified) - simple: pre-norm residual h=h+f(norm(h)) for gradient flow - gated: learnable per-neuron gate alpha*h+(1-alpha)*f(norm(h)) - Extract _inject_input() helper for DRY input injection - Support residual_gate expansion in neurogenesis - Add RESIDUAL_MODE config to experiment_llm.py - Update LIBRARY.md, README.md, PoC_STANDARDS.md docs
…d) - Add residual_mode parameter to RealNet (none, simple, gated) - none: original behavior (bit-for-bit identical, verified) - simple: pre-norm residual h=h+f(norm(h)) for gradient flow - gated: learnable per-neuron gate alpha*h+(1-alpha)*f(norm(h)) - Extract _inject_input() helper for DRY input injection - Support residual_gate expansion in neurogenesis - Add RESIDUAL_MODE config to experiment_llm.py - Update LIBRARY.md, README.md, PoC_STANDARDS.md docs" This reverts commit 80a0bb4.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces RealNet-native training components (ChaosGrad optimizer + TemporalScheduler) and refactors RealNetTrainer + various PoC/experiment scripts to use config-driven optimizer/scheduler initialization rather than manual per-script setup.
Changes:
- Added
ChaosGrad/ChaosGradConfigandTemporalScheduler/TemporalSchedulerConfig, and exposed them viarealnet.__init__. - Refactored
RealNetTrainerto auto-select optimizers/schedulers (including re-init behavior after neurogenesis) and added diagnostic helpers. - Updated multiple PoC/experiment scripts and library docs to use the new config-based trainer initialization (plus some dataset/hyperparameter changes in the LLM notebook/script).
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| realnet/utils/realstore.py | Extends weight transplantation to support re-initializing new regions via an init_new strategy. |
| realnet/utils/neurogenesis.py | Updates neurogenesis weight expansion initialization for new connections. |
| realnet/training/trainer.py | Adds ChaosGrad/TemporalScheduler integration, optimizer/scheduler selection, and diagnostics. |
| realnet/training/chaos_scheduler.py | New adaptive LR scheduler with warmup/cosine decay/restart logic and presets. |
| realnet/training/chaos_optimizer.py | New ChaosGrad optimizer with parameter grouping, adaptive LR, plateau escape, and diagnostics. |
| realnet/core/network.py | Adds new weight init strategies (micro_quiet, micro_quiet_8bit). |
| realnet/init.py | Exports ChaosGrad and TemporalScheduler APIs at package top-level. |
| realnet/LIBRARY.md | Documents ChaosGrad/TemporalScheduler usage via RealNetTrainer and direct usage. |
| RealNET.ipynb | Switches the LLM notebook’s dataset/tokenizer flow to TinyStories and updates naming/paths. |
| README_TR.md | Removes FineWeb-specific wording from an insight bullet. |
| README.md | Removes FineWeb-specific wording from an insight bullet. |
| PoC/experiments/experiment_llm.py | Refactors trainer init to ChaosGrad + trainer-integrated scheduler; also changes dataset + several hyperparameters. |
| PoC/experiments/convergence_stopwatch.py | Switches optimizer setup to ChaosGradConfig passed into the trainer. |
| PoC/experiments/convergence_sine_wave.py | Switches optimizer setup to ChaosGradConfig passed into the trainer. |
| PoC/experiments/convergence_realnet_as_database.py | Switches optimizer setup to ChaosGradConfig passed into the trainer. |
| PoC/experiments/convergence_mnist_tiny.py | Switches optimizer setup to ChaosGradConfig passed into the trainer. |
| PoC/experiments/convergence_mnist_scaled.py | Switches optimizer setup to ChaosGradConfig passed into the trainer. |
| PoC/experiments/convergence_mnist_revive.py | Switches optimizer setup to ChaosGradConfig passed into the trainer. |
| PoC/experiments/convergence_mnist_record.py | Switches optimizer setup to ChaosGradConfig; removes manual scheduler stepping and updates LR reporting source. |
| PoC/experiments/convergence_mnist_embed.py | Switches optimizer setup to ChaosGradConfig passed into the trainer. |
| PoC/experiments/convergence_latch.py | Switches optimizer setup to ChaosGradConfig passed into the trainer. |
| PoC/experiments/convergence_detective_thinking.py | Switches optimizer setup to ChaosGradConfig passed into the trainer. |
| PoC/experiments/convergence_adder.py | Switches optimizer setup to ChaosGradConfig passed into the trainer. |
| PoC/convergence_mnist.py | Switches optimizer setup to ChaosGradConfig passed into the trainer. |
| PoC/convergence_identity.py | Switches optimizer setup to ChaosGradConfig.tiny_network passed into the trainer. |
| PoC/convergence_gates.py | Switches optimizer setup to ChaosGradConfig.tiny_network passed into the trainer. |
Comments suppressed due to low confidence (2)
PoC/experiments/experiment_llm.py:18
TemporalSchedulerConfigis imported here but never used (scheduler config is constructed as a plain dict). Consider removing the unused import, or switching toTemporalSchedulerConfig.*()presets for consistency with the rest of the refactor.
PoC/experiments/experiment_llm.py:86- This script change switches the tokenizer training dataset from FineWeb to TinyStories (and also adjusts several model/data hyperparameters in this file). That’s a significant behavioral change not described in the PR description, which focuses on optimizer/scheduler refactors. Either update the PR description to include the dataset/hyperparameter change, or move these adjustments into a separate PR to keep the scope clear.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+185
to
188
| " print(f\"📚 Training new {k_size}k BPE Tokenizer from data slice...\")\n", | ||
| " tokenizer = ByteLevelBPETokenizer()\n", | ||
| " dataset_sample = load_dataset(\"HuggingFaceFW/fineweb-edu\", name=\"CC-MAIN-2024-10\", split=\"train\", streaming=True)\n", | ||
| " dataset_sample = load_dataset(\"roneneldan/TinyStories\", split=\"train\", streaming=True)\n", | ||
| " \n", |
realnet/training/chaos_scheduler.py
Outdated
| def _cosine_lr(self, step): | ||
| """Calculate cosine decay LR multiplier.""" | ||
| effective_step = step - self._cycle_start_step | ||
| effective_max = self.max_steps - self._cycle_start_step |
realnet/training/chaos_scheduler.py
Outdated
| return self.min_lr_ratio | ||
|
|
||
| # Cosine decay | ||
| decay_ratio = (effective_step - self.warmup_steps) / max(1, effective_max - self.warmup_steps) |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…ne shrinking on warm restart
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Owner
Author
|
@copilot what do you think now? |
Contributor
|
@theomgdev I've opened a new pull request, #7, to work on those changes. Once the pull request is ready, I'll request review from you. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request refactors the initialization of the
RealNetTraineracross multiple proof-of-concept and experiment scripts to standardize optimizer and training configuration. Instead of manually setting the optimizer and loss function, the code now leverages the newChaosGradConfigclass to encapsulate optimizer settings and training parameters, resulting in cleaner and more maintainable experiment scripts.Trainer initialization and configuration refactor:
convergence_gates.py,convergence_identity.py,convergence_mnist.py,convergence_mnist_embed.py,convergence_mnist_record.py,convergence_mnist_revive.py,convergence_mnist_scaled.py,convergence_mnist_tiny.py,convergence_realnet_as_database.py,convergence_sine_wave.py,convergence_adder.py,convergence_detective_thinking.py,convergence_latch.py) with initialization usingChaosGradConfigpassed toRealNetTrainer. This eliminates direct optimizer assignment and manual weight decay/loss function setup, promoting consistency and reducing boilerplate. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]ChaosGradConfig(andTemporalSchedulerConfigwhere relevant), reflecting the new dependency and usage pattern. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]ChaosGradConfigpresets (default,aggressive,tiny_network), ensuring consistent and experiment-specific settings. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]convergence_mnist_record.pyto reference the trainer’s scheduler/optimizer, reflecting the new encapsulation. [1] [2]Overall, these changes make the experiment scripts easier to maintain, more consistent, and less error-prone by centralizing optimizer and training configuration logic.