[WIP/Draft] feat(models): ContinuousESN — PR3 (stacked on #450)#456
[WIP/Draft] feat(models): ContinuousESN — PR3 (stacked on #450)#456Saswatsusmoy wants to merge 3 commits into
Conversation
|
Thanks @MartinuzziFrancesco — both the §3.2.6 pointer and the "skip the struct, run it raw" suggestion turned out to land in interesting places. On §3.2.6 (Lukoševičius 2012, "Leaking Rate")The relevant ODE is eq (5): — no This reframes PR3 quite a bit. The Anantharaman CTESN paper uses Smoke test — eq (5) raw on Lorenz, no CTESN structRan the tutorial pattern scaled to First pass, train NRMSE = 0.0004 (readout fits beautifully) but autoregressive rollout NRMSE was ~1.5 at all horizons. Tracked it down to the autoregressive current_state = res.prob.u0 # → zeros(N) every timeThe trained reservoir's terminal state never makes it into the predict-time
So eq (5) forecasts Lorenz at N=300 to ~3 Lyapunov times under NRMSE ≈ 0.11 before chaotic divergence dominates. Not quite our PR3 target of NRMSE < 0.1, but within striking distance of it before any hyperparameter tuning (spectral radius vs spectral norm, leak/time-constant, washout length, regularisation). I think the < 0.1 target is reachable. What I'd like your read on:
Will hold off on writing real code until you've had a chance to push back on any of this — easier to redo the design now than after I've committed 600 lines. |
|
Thanks for running the experiments, they are quite interesting and I think necessary even for merging #450. To address you point number 1) I think having the two code snippets side to side would help in understanding if this is something that could be a gotcha for the user, and therefore if we should find a way to deal with it in a elegant manner. As far as 2), I think we can go with |
|
Thanks @MartinuzziFrancesco. Re-ran both paths from scratch for the side-by-side, and lining up the other two answers below. Cold vs warmed predict — side-by-sideBoth scripts are identical except for the warmup step. Reproducible numbers on Julia 1.12.5 with
|
| Horizon | Cold (u0 = 0) |
Warmed (u0 = trained terminal) |
|---|---|---|
| 1 t_λ (55 steps) | 1.4893 | 0.1568 |
| 2 t_λ (110 steps) | 1.4960 | 0.1183 |
| 3 t_λ (166 steps) | 1.7431 | 0.1118 |
| 4 t_λ (221 steps) | 1.6959 | 0.2099 |
| 6 t_λ (331 steps) | 1.6444 | 0.7294 |
| 8 t_λ (442 steps) | 1.4708 | 0.9808 |
Train NRMSE (teacher-forced one-step on training data) was 0.0004 in both cases — the readout fits cleanly; the cold path's failure is purely the predict-time u0.
Two small caveats on what I ran:
- Eq (5) bundles bias as the leading column of
W_in[1;u]; my script splits it into a separate+bterm — algebraically identical, just different parameter layout. warm_u0 = trained_states[:, end]is the post-modifier state if anystate_modifiersare wired up. None here, so it equals the raw reservoir terminal. If a real warmup API were to expose this, "warm from the raw reservoir terminal" and "warm from the modified terminal" would need to be a deliberate choice.
If you read this as a real footgun, the lightest-touch fix I can see is a warmup_data::Union{Nothing, AbstractMatrix} = nothing kwarg on the SciMLProblemReservoir paths of predict, where a non-nothing value triggers collectstates(rc, warmup_data, ps, st) internally and seeds u0 from the last column. No behaviour change for users who don't pass it. Happy to follow your lead on whether this lands in #450 or in PR3's wrapper.
On the naming
ContinuousESN works for me. I'll rename the stub on add-ctesn → new branch + new file + re-titled PR. The Anantharaman parametric-surrogate flavour can be its own type in its own PR if anyone needs it.
On validation via Grassberger-Procaccia
Will pull in FractalDimensions.jl (v1.9.6 as of writing). Plan is grassberger_proccacia_dim(StateSpaceSet(...)) on both the true Lorenz attractor and the warmed-rollout attractor, then assert the predicted dimension matches reference (Lorenz '63 correlation dimension is reported at ≈ 2.05 in the literature) to some agreed tolerance, alongside the attractor-reconstruction eye-test from the README. I won't quote actual fractal-dim numbers from my run until I've actually computed them — flagging the plan rather than the result.
Francesco's call on SciML#456: drop the CTESN name (which referred to the Anantharaman 2021 parametric-surrogate paper) and ship instead a thin wrapper around the Lukoševičius 2012 §3.2.6 eq (5) leaky-integrator continuous ESN. Stub file moves from `src/extensions/` to `src/models/` to sit alongside ESN as a semantic sibling; docstring rewritten to reflect the new scope (leaky-integrator default, parametric-surrogate CTESN explicitly out of scope).
Skeleton-only placeholder so the draft PR exists as a discussion vehicle. No include, no export, no tests — the body errors with a clear message directing the user to the (not-yet-implemented) extension constructor. Open design questions are tracked in the PR description. Stacked on PR SciML#450; will be rebased onto master once that merges.
Francesco's call on SciML#456: drop the CTESN name (which referred to the Anantharaman 2021 parametric-surrogate paper) and ship instead a thin wrapper around the Lukoševičius 2012 §3.2.6 eq (5) leaky-integrator continuous ESN. Stub file moves from `src/extensions/` to `src/models/` to sit alongside ESN as a semantic sibling; docstring rewritten to reflect the new scope (leaky-integrator default, parametric-surrogate CTESN explicitly out of scope).
Prior commit captured only the file rename — the docstring content update was on disk but not staged. This commit folds in the actual docstring rewrite: references Lukoševičius 2012 §3.2.6 eq (5) as the canonical ODE, explains how the discrete leak rate α maps to the continuous Euler step, and explicitly distinguishes from the Anantharaman 2021 parametric-surrogate CTESN.
Warning
Draft / discussion only. No real implementation yet — the only commit is a docstring stub that errors at call time. Stacked on top of #450; I'll rebase onto master once that merges.
Hey @MartinuzziFrancesco — PR2 (#450) is sitting in good shape ("
MERGEABLE/CLEAN", all 17 checks green) and per the roadmap I'd start PR3 (CTESN model + Mackey-Glass / Lorenz validation + docs) next. Before I write code I wanted to surface a handful of design questions, because the literature on CTESN diverges from the ESN-community defaults in a few non-obvious places. Easier to align upfront than redo the implementation later.The model
Anantharaman2021 — "Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks" — defines the reservoir as
with
f = tanh,g = idhardcoded,Asparse random,W_hybrandom dense. No leak term, no bias. Their training procedure is a parametric surrogate: simulate the full model atnSobol-sampled parameters, fit oneW_outper sample by SVD, then RBF-interpolateW_out(p)across the parameter space (uses Surrogates.jl, mentioned explicitly).This is genuinely different from the "leaky-integrator continuous ESN" that one naturally expects as a continuous limit of Lukosevicius2012 —
ṙ = -α r + tanh(W_in u + W_r r + b). Several of my questions below stem from this gap.I read the canonical paper's PDF and skimmed Bhatnagar 2024 (LPCTESN / NLPCTESN follow-up); ground-truth sources noted where they matter below.
Open questions
Q1 — Wrapping pattern: mirror ESN, or expose CTESN as the reservoir layer directly?
ESNis@concrete struct ESN <: AbstractEchoStateNetwork{(:reservoir, :states_modifiers, :readout)}with the cell sitting in:reservoir. Two options for CTESN:CTESN <: AbstractEchoStateNetwork{(:reservoir, :states_modifiers, :readout)}, with the:reservoirfield holding a concrete<: AbstractSciMLProblemReservoirthat PR2's_collectstates(::AbstractSciMLProblemReservoir, …)dispatches on. User-facing call site isCTESN(in_dims, res_dims, out_dims; …).CTESN <: AbstractSciMLProblemReservoir, and the user doesReservoirComputer(CTESN(...), modifiers, readout)themselves (the waySciMLProblemReservoiris used in the PR2 tutorial).(a) is more discoverable but adds a thin wrapper; (b) is closer to what PR2 already does. Your "built-in models get their own types" steer (issuecomment-4604259768) reads to me as (a), but I want to confirm.
Q2 — Default ODE form: paper-canonical or leaky-integrator?
The paper-canonical form has no
αand no bias. The leaky-integrator form is what most ESN users would expect from a "continuous ESN" type. My proposal:leak_rate = 0.0) for faithfulness to Anantharaman et al.leak_rate::Realkwarg so users can opt into the leaky form trivially.OK with you, or should the leaky form be the default?
Q3 — Sampler: does
TerminalStateSamplingcover forecasting?For the forecasting flavour (single trajectory, fit
W_outbyY · R⁺) we need one reservoir-state column per output column. Reading PR2's_collectstatescarefully, the window-endsaveatgrid already produces exactly that (n_samples = size(data, 2)columns), soTerminalStateSamplinglooks sufficient — I don't think aDenseTrajectorySamplingsampler is needed in PR3. Want to sanity-check that with you before assuming.(The parametric-surrogate flavour from the paper would want dense trajectories. I'd defer that to a future PR — see Q6.)
Q4 — File placement:
src/models/ctesn.jlorsrc/extensions/ctesn.jl?The draft commit puts the stub under
src/extensions/ctesn.jl, mirroring RECA (the only existing extension-required model). But CTESN is conceptually a sibling of ESN/EuSN/ES2N which all live insrc/models/. Happy to move it — what's your preference?Q5 — Validation benchmarks
The paper validates CTESN on stiff systems (Robertson, HVAC, POLLU). My proposal target for PR3 is Mackey-Glass (τ = 17, Jaeger2001) and Lorenz '63 (NRMSE < 0.1 at N = 300), which are the ESN-community standards and have rich published baselines for sanity-checking (Pathak2017 is the Lorenz-N=300 reference). Neither benchmark has published CTESN numbers — we'd be contributing the first.
Two paths:
I lean (i) for PR3 and adding (ii) as a follow-up, but I'll do (ii) in PR3 if you'd rather have the paper's own validation domain in there from day one.
Q6 — Scope: LPCTESN only, or also NLPCTESN / parametric surrogate?
Bhatnagar 2024 discusses LPCTESN (linear
W_out · r) and NLPCTESN (k-NN polynomial-augmented RBF readout, ~3 orders of magnitude smaller reservoirs at the same accuracy on Robertson). The Anantharaman 2021 paper itself does the parametric-surrogate-over-parameter-space training (Surrogates.jl + RBF interpolation ofW_out(p)).PR3 as currently scoped covers forecasting + LPCTESN only. NLPCTESN needs a polynomial-feature readout + k-NN RBF (extra deps). Parametric-surrogate training needs Surrogates.jl. Both feel out-of-scope for one PR — defer to follow-ups?
Q7 — Spectral-norm warning (open Q5 from #397 / proposal)
The continuous-time echo state property needs
‖A‖_2 < α(operator 2-norm), not the discrete-timeρ(A) < α. Existingscale_radius!(src/inits/inits_components.jl:93) usesmaximum(abs.(eigvals(M)))— that's the discrete bound. Three options for the doc warning:CTESNdocstring only ("treat ρ as empirical hyperparameter; the proven sufficient condition is‖A‖_2 < α").scale_radius!docstring flagging that the bound it enforces is the discrete analogue.scale_opnorm!for users who want the strict continuous bound.(a) + (b) feels minimal and right; (c) adds API surface area I don't want to commit to without your nod.
Q8 — NRMSE: inline or proper helper?
grep -i nrmse src/returns zero matches andlib/ReservoirComputingBenchmarks/is still an empty placeholder, so #414 hasn't landed yet. For PR3 I'll inline NRMSE (Lukoševičius 2012 eq 1:sqrt(mean((ŷ-y)²) / var(y))per output, averaged) in the test + tutorial code. If #414 lands first I'll source it from there instead. Fine?Q9 — Pacing
Are you OK with me starting PR3 now stacked on
add-ode-reservoir-ext, on the assumption that PR2 will land roughly as-is? If you'd rather I wait for PR2 to merge first, I'll park this draft and come back.What I'll do once you reply
Convert this stub into the real implementation: concrete CTESN type, in-place ODE RHS with pre-allocated buffers (
mul!for both matvecs — the RHS gets called O(10²–10⁴) times per solve), test file undertest/models/, and a tutorial underdocs/src/tutorials/. I'll rebase onto master after PR2 merges.For reference / my own notes — the parallel research pass turned up three small factual things I'd like to flag for the project proposal too (paper title is "Stiff Nonlinear Systems" not "Chaotic Dynamical Systems", the canonical-paper authors are Anantharaman, Ma, Gowda, Laughman, Shah, Edelman, Rackauckas, and the 56× speedup figure in the proposal doesn't appear in the abstract — happy to ping you separately about that rather than mix it into the PR thread).
Thanks!