Skip to content

[WIP/Draft] feat(models): ContinuousESN — PR3 (stacked on #450)#456

Draft
Saswatsusmoy wants to merge 3 commits into
SciML:masterfrom
Saswatsusmoy:add-ctesn
Draft

[WIP/Draft] feat(models): ContinuousESN — PR3 (stacked on #450)#456
Saswatsusmoy wants to merge 3 commits into
SciML:masterfrom
Saswatsusmoy:add-ctesn

Conversation

@Saswatsusmoy

Copy link
Copy Markdown
Contributor

Warning

Draft / discussion only. No real implementation yet — the only commit is a docstring stub that errors at call time. Stacked on top of #450; I'll rebase onto master once that merges.

Hey @MartinuzziFrancesco — PR2 (#450) is sitting in good shape ("MERGEABLE / CLEAN", all 17 checks green) and per the roadmap I'd start PR3 (CTESN model + Mackey-Glass / Lorenz validation + docs) next. Before I write code I wanted to surface a handful of design questions, because the literature on CTESN diverges from the ESN-community defaults in a few non-obvious places. Easier to align upfront than redo the implementation later.

The model

Anantharaman2021"Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks" — defines the reservoir as

$$\dot{\mathbf{r}}(t) = \tanh\!\left(A\,\mathbf{r}(t) + W_{\text{hyb}}\,\mathbf{x}(t)\right)$$

with f = tanh, g = id hardcoded, A sparse random, W_hyb random dense. No leak term, no bias. Their training procedure is a parametric surrogate: simulate the full model at n Sobol-sampled parameters, fit one W_out per sample by SVD, then RBF-interpolate W_out(p) across the parameter space (uses Surrogates.jl, mentioned explicitly).

This is genuinely different from the "leaky-integrator continuous ESN" that one naturally expects as a continuous limit of Lukosevicius2012ṙ = -α r + tanh(W_in u + W_r r + b). Several of my questions below stem from this gap.

I read the canonical paper's PDF and skimmed Bhatnagar 2024 (LPCTESN / NLPCTESN follow-up); ground-truth sources noted where they matter below.

Open questions

Q1 — Wrapping pattern: mirror ESN, or expose CTESN as the reservoir layer directly?

ESN is @concrete struct ESN <: AbstractEchoStateNetwork{(:reservoir, :states_modifiers, :readout)} with the cell sitting in :reservoir. Two options for CTESN:

  • (a) Mirror ESN. CTESN <: AbstractEchoStateNetwork{(:reservoir, :states_modifiers, :readout)}, with the :reservoir field holding a concrete <: AbstractSciMLProblemReservoir that PR2's _collectstates(::AbstractSciMLProblemReservoir, …) dispatches on. User-facing call site is CTESN(in_dims, res_dims, out_dims; …).
  • (b) CTESN is the reservoir layer. CTESN <: AbstractSciMLProblemReservoir, and the user does ReservoirComputer(CTESN(...), modifiers, readout) themselves (the way SciMLProblemReservoir is used in the PR2 tutorial).

(a) is more discoverable but adds a thin wrapper; (b) is closer to what PR2 already does. Your "built-in models get their own types" steer (issuecomment-4604259768) reads to me as (a), but I want to confirm.

Q2 — Default ODE form: paper-canonical or leaky-integrator?

The paper-canonical form has no α and no bias. The leaky-integrator form is what most ESN users would expect from a "continuous ESN" type. My proposal:

  • Default to paper-canonical (leak_rate = 0.0) for faithfulness to Anantharaman et al.
  • Expose leak_rate::Real kwarg so users can opt into the leaky form trivially.

OK with you, or should the leaky form be the default?

Q3 — Sampler: does TerminalStateSampling cover forecasting?

For the forecasting flavour (single trajectory, fit W_out by Y · R⁺) we need one reservoir-state column per output column. Reading PR2's _collectstates carefully, the window-end saveat grid already produces exactly that (n_samples = size(data, 2) columns), so TerminalStateSampling looks sufficient — I don't think a DenseTrajectorySampling sampler is needed in PR3. Want to sanity-check that with you before assuming.

(The parametric-surrogate flavour from the paper would want dense trajectories. I'd defer that to a future PR — see Q6.)

Q4 — File placement: src/models/ctesn.jl or src/extensions/ctesn.jl?

The draft commit puts the stub under src/extensions/ctesn.jl, mirroring RECA (the only existing extension-required model). But CTESN is conceptually a sibling of ESN/EuSN/ES2N which all live in src/models/. Happy to move it — what's your preference?

Q5 — Validation benchmarks

The paper validates CTESN on stiff systems (Robertson, HVAC, POLLU). My proposal target for PR3 is Mackey-Glass (τ = 17, Jaeger2001) and Lorenz '63 (NRMSE < 0.1 at N = 300), which are the ESN-community standards and have rich published baselines for sanity-checking (Pathak2017 is the Lorenz-N=300 reference). Neither benchmark has published CTESN numbers — we'd be contributing the first.

Two paths:

  • (i) Land Mackey-Glass + Lorenz only (matches what's in the proposal, ships faster).
  • (ii) Add a stiff benchmark too (Robertson is the smallest — 3 equations) to demonstrate the paper's stated regime.

I lean (i) for PR3 and adding (ii) as a follow-up, but I'll do (ii) in PR3 if you'd rather have the paper's own validation domain in there from day one.

Q6 — Scope: LPCTESN only, or also NLPCTESN / parametric surrogate?

Bhatnagar 2024 discusses LPCTESN (linear W_out · r) and NLPCTESN (k-NN polynomial-augmented RBF readout, ~3 orders of magnitude smaller reservoirs at the same accuracy on Robertson). The Anantharaman 2021 paper itself does the parametric-surrogate-over-parameter-space training (Surrogates.jl + RBF interpolation of W_out(p)).

PR3 as currently scoped covers forecasting + LPCTESN only. NLPCTESN needs a polynomial-feature readout + k-NN RBF (extra deps). Parametric-surrogate training needs Surrogates.jl. Both feel out-of-scope for one PR — defer to follow-ups?

Q7 — Spectral-norm warning (open Q5 from #397 / proposal)

The continuous-time echo state property needs ‖A‖_2 < α (operator 2-norm), not the discrete-time ρ(A) < α. Existing scale_radius! (src/inits/inits_components.jl:93) uses maximum(abs.(eigvals(M))) — that's the discrete bound. Three options for the doc warning:

  • (a) Inline note in CTESN docstring only ("treat ρ as empirical hyperparameter; the proven sufficient condition is ‖A‖_2 < α").
  • (b) Add a paragraph to scale_radius! docstring flagging that the bound it enforces is the discrete analogue.
  • (c) Add a new helper scale_opnorm! for users who want the strict continuous bound.

(a) + (b) feels minimal and right; (c) adds API surface area I don't want to commit to without your nod.

Q8 — NRMSE: inline or proper helper?

grep -i nrmse src/ returns zero matches and lib/ReservoirComputingBenchmarks/ is still an empty placeholder, so #414 hasn't landed yet. For PR3 I'll inline NRMSE (Lukoševičius 2012 eq 1: sqrt(mean((ŷ-y)²) / var(y)) per output, averaged) in the test + tutorial code. If #414 lands first I'll source it from there instead. Fine?

Q9 — Pacing

Are you OK with me starting PR3 now stacked on add-ode-reservoir-ext, on the assumption that PR2 will land roughly as-is? If you'd rather I wait for PR2 to merge first, I'll park this draft and come back.


What I'll do once you reply

Convert this stub into the real implementation: concrete CTESN type, in-place ODE RHS with pre-allocated buffers (mul! for both matvecs — the RHS gets called O(10²–10⁴) times per solve), test file under test/models/, and a tutorial under docs/src/tutorials/. I'll rebase onto master after PR2 merges.

For reference / my own notes — the parallel research pass turned up three small factual things I'd like to flag for the project proposal too (paper title is "Stiff Nonlinear Systems" not "Chaotic Dynamical Systems", the canonical-paper authors are Anantharaman, Ma, Gowda, Laughman, Shah, Edelman, Rackauckas, and the 56× speedup figure in the proposal doesn't appear in the abstract — happy to ping you separately about that rather than mix it into the PR thread).

Thanks!

@Saswatsusmoy

Saswatsusmoy commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @MartinuzziFrancesco — both the §3.2.6 pointer and the "skip the struct, run it raw" suggestion turned out to land in interesting places.

On §3.2.6 (Lukoševičius 2012, "Leaking Rate")

The relevant ODE is eq (5):

$$\dot{\mathbf{x}} = -\mathbf{x} + \tanh\!\left(\mathbf{W}^{\text{in}}[1;\mathbf{u}] + \mathbf{W}\,\mathbf{x}\right)$$

— no α in the continuous form. The leaking rate appears only when you Euler-discretise: take Δx/Δt ≈ ẋ with step Δt = α and you recover the discrete leaky update x(n+1) = (1-α)x(n) + α·tanh(W_in[1;u] + Wx(n)). So α is the discretisation step, not a separate parameter of the ODE. Equivalently you can fold it into eq (5) as a LHS time constant α·ẋ = … if you want to keep Δt ≡ 1. The section also notes this form is preferred over alternatives "because it guarantees that x(n) never goes outside the (-1, 1) interval" — bounded state by construction, no extra contraction argument needed. Practical advice: tune α to the timescale of u/y_target; can be per-neuron for multi-timescale tasks.

This reframes PR3 quite a bit. The Anantharaman CTESN paper uses ṙ = tanh(Ar + W_hyb x) with no decay term and trains a parametric surrogate via RBF interpolation of W_out(p) — a different approach aimed at stiff systems. The "continuous ESN" §3.2.6 points at is just eq (5), which is exactly what the Lorenz eye-test in PR #450's tutorial implements verbatim. So PR3 collapses from "build a CTESN model mirroring the paper" → "thin convenience constructor around SciMLProblemReservoir that pre-bakes eq (5)". Several of the open questions in the original PR body fall away.

Smoke test — eq (5) raw on Lorenz, no CTESN struct

Ran the tutorial pattern scaled to N_res = 300, train = 5000, predict = 1250 samples (dt = 0.02, σ = 10, ρ = 28, β = 8/3). Spectral scaling ‖W_r‖₂ = 0.9 (continuous-ESP-sufficient bound), ‖W_in‖ ≈ 0.1, ‖b‖ ≈ 0.05, MersenneTwister(17), Tsit5 at reltol = 1e-6. Happy to inline the full script as a follow-up comment if useful.

First pass, train NRMSE = 0.0004 (readout fits beautifully) but autoregressive rollout NRMSE was ~1.5 at all horizons. Tracked it down to the autoregressive _predict path in ext/RCODEReservoirExt.jl:

current_state = res.prob.u0   # → zeros(N) every time

The trained reservoir's terminal state never makes it into the predict-time u0. So the rollout starts cold and produces nothing useful regardless of how well the readout was trained. After manually re-running collectstates, grabbing the last column, and rebuilding the reservoir with remake(prob; u0 = terminal_state), the numbers became:

Horizon (Lyapunov times) Steps NRMSE
1 t_λ 55 0.157
2 t_λ 110 0.118
3 t_λ 166 0.112
4 t_λ 221 0.210
6 t_λ 331 0.729
8 t_λ 442 0.981

So eq (5) forecasts Lorenz at N=300 to ~3 Lyapunov times under NRMSE ≈ 0.11 before chaotic divergence dominates. Not quite our PR3 target of NRMSE < 0.1, but within striking distance of it before any hyperparameter tuning (spectral radius vs spectral norm, leak/time-constant, washout length, regularisation). I think the < 0.1 target is reachable.

What I'd like your read on:

  1. The warm-up hole. Is this expected for the current PR feat: continuous-time reservoirs via RCODEReservoirExt #450 predict semantics — i.e., users are meant to remake(prob; u0 = …) themselves before calling predict? Or is this something that should be wired in (e.g. predict(rc, steps, ps, st; initialdata, warmup_data = nothing) that internally runs collectstates on warmup_data and seeds u0)? The current behaviour is correct but really easy to footgun on, as I just demonstrated.

  2. PR3 reframe. Given §3.2.6, would you rather PR3 be:

    • (a) Drop the "CTESN" name entirely and ship a LeakyContinuousESN(in_dims, res_dims, out_dims; α, …) thin wrapper that pre-bakes eq (5) — semantically a sibling of ESN. The Anantharaman parametric-surrogate flavour would be a future PR with its own type.
    • (b) Keep the CTESN name but make it the eq-(5) wrapper, and treat the parametric-surrogate version as a future extension within the same family.
    • (c) Something else entirely.
  3. Validation. Should the in-PR test assert specifically against the eq-(5) Lorenz NRMSE (after tuning to push below 0.1 at some agreed horizon — e.g. 4 t_λ), or do you want a tighter set of unit tests + Mackey-Glass + Lorenz in the docs as eye-tests only?

Will hold off on writing real code until you've had a chance to push back on any of this — easier to redo the design now than after I've committed 600 lines.

@MartinuzziFrancesco

Copy link
Copy Markdown
Collaborator

Thanks for running the experiments, they are quite interesting and I think necessary even for merging #450. To address you point number 1) I think having the two code snippets side to side would help in understanding if this is something that could be a gotcha for the user, and therefore if we should find a way to deal with it in a elegant manner. As far as 2), I think we can go with ContinuousESN thin wrapper of eq 5). And as far as validation I think we can go with the eye test of attractor reconstruction following the example of the readme. If you want a more qualitative test we can use the difference in fractal dimensions using FractalDimensions's Grassberger-Procaccia algorithm. I would say that this is a bit more solid wrt to vpt, and it should be enough for PR3

@Saswatsusmoy

Copy link
Copy Markdown
Contributor Author

Thanks @MartinuzziFrancesco. Re-ran both paths from scratch for the side-by-side, and lining up the other two answers below.

Cold vs warmed predict — side-by-side

Both scripts are identical except for the warmup step. Reproducible numbers on Julia 1.12.5 with MersenneTwister(17), N = 300, train = 5000, predict = 1250, dt = 0.02, Tsit5 at reltol = 1e-6, ‖W_r‖₂ = 0.9, ‖W_in‖ ≈ 0.1.

lorenz_cold.jl — predict's u0 is whatever the user gave the ODEProblem (here zeros(N))
import Pkg; Pkg.activate("/tmp/lorenz_env"; io = devnull)
using Random, Statistics, LinearAlgebra, Printf
using ReservoirComputing, SciMLBase, DataInterpolations, OrdinaryDiffEqTsit5

const SEED, N_RES = 17, 300
const TRAIN_LEN, PREDICT_LEN, DT, SHIFT = 5000, 1250, 0.02, 300
const T_LYAPUNOV = 1 / 0.9056

lorenz!(du, u, p, t) = begin
    du[1] = p[1] * (u[2] - u[1])
    du[2] = u[1] * (p[2] - u[3]) - u[2]
    du[3] = u[1] * u[2] - p[3] * u[3]
end
data_prob = ODEProblem(lorenz!, [1.0, 0.0, 0.0],
    (0.0, (SHIFT + TRAIN_LEN + PREDICT_LEN + 100) * DT), [10.0, 28.0, 8 / 3])
data = Array(solve(data_prob, Tsit5(); saveat = DT))
input_data  = data[:, SHIFT:(SHIFT + TRAIN_LEN - 1)]
target_data = data[:, (SHIFT + 1):(SHIFT + TRAIN_LEN)]
test_data   = data[:, (SHIFT + TRAIN_LEN):(SHIFT + TRAIN_LEN + PREDICT_LEN - 1)]

rng = MersenneTwister(SEED); Random.seed!(SEED)
Wr_raw = randn(rng, N_RES, N_RES) ./ sqrt(N_RES)
Wr  = (0.9 / opnorm(Wr_raw)) .* Wr_raw
Win = 0.1 .* randn(rng, N_RES, 3)
b   = 0.05 .* randn(rng, N_RES)

esn_rhs!(dx, x, p, t) = (dx .= .-x .+ tanh.(p.Wr * x .+ p.Win * p.input(t) .+ p.b); nothing)

function build_rc(tspan_len, u0)
    prob = ODEProblem(esn_rhs!, u0, (0.0, Float64(tspan_len)), (Wr = Wr, Win = Win, b = b))
    res = SciMLProblemReservoir(prob, TerminalStateSampling(),
        (0.0, Float64(tspan_len)), Tsit5(); reltol = 1.0e-6, abstol = 1.0e-8)
    return ReservoirComputer(res, LinearReadout(N_RES => 3))
end

rc_train = build_rc(TRAIN_LEN, zeros(N_RES))
ps, st = setup(rng, rc_train)
ps, st = train!(rc_train, input_data, target_data, ps, st)

# COLD: predict's u0 stays at the prob's original u0 (zeros).
rc_pred = build_rc(PREDICT_LEN, zeros(N_RES))
ps_pred, st_pred = setup(rng, rc_pred)
ps_pred = merge(ps_pred, (readout = ps.readout,))
st_pred = merge(st_pred, (readout = st.readout,))
output, _ = predict(rc_pred, PREDICT_LEN, ps_pred, st_pred; initialdata = test_data[:, 1])

nrmse(yhat, y) = mean(sqrt(mean((yhat[i, :] .- y[i, :]) .^ 2)) / std(y[i, :]) for i in 1:size(y, 1))
for lt in (1.0, 2.0, 3.0, 4.0, 6.0, 8.0)
    h = min(round(Int, lt * T_LYAPUNOV / DT), PREDICT_LEN)
    @printf("  horizon %.1f t_λ  (%4d steps): NRMSE = %.4f\n", lt, h, nrmse(output[:, 1:h], test_data[:, 1:h]))
end
lorenz_warm.jl — same script, with the warmup inserted before build_rc(PREDICT_LEN, …)
# … everything above identical to lorenz_cold.jl through `train!` …

# WARMUP: re-run collectstates, grab terminal reservoir state.
trained_states, _ = collectstates(rc_train, input_data, ps, st)
warm_u0 = collect(trained_states[:, end])

# WARMED: predict's u0 = trained reservoir terminal state.
rc_pred = build_rc(PREDICT_LEN, warm_u0)
ps_pred, st_pred = setup(rng, rc_pred)
ps_pred = merge(ps_pred, (readout = ps.readout,))
st_pred = merge(st_pred, (readout = st.readout,))
output, _ = predict(rc_pred, PREDICT_LEN, ps_pred, st_pred; initialdata = test_data[:, 1])

# … NRMSE loop identical to lorenz_cold.jl …

The semantic delta is exactly: collectstates once after train!, take trained_states[:, end], rebuild the predict reservoir with that as u0. Four added lines, one swapped line.

Results (autoregressive rollout NRMSE per Lukoševičius 2012 eq 1, mean over channels of RMSE / std):

Horizon Cold (u0 = 0) Warmed (u0 = trained terminal)
1 t_λ (55 steps) 1.4893 0.1568
2 t_λ (110 steps) 1.4960 0.1183
3 t_λ (166 steps) 1.7431 0.1118
4 t_λ (221 steps) 1.6959 0.2099
6 t_λ (331 steps) 1.6444 0.7294
8 t_λ (442 steps) 1.4708 0.9808

Train NRMSE (teacher-forced one-step on training data) was 0.0004 in both cases — the readout fits cleanly; the cold path's failure is purely the predict-time u0.

Two small caveats on what I ran:

  • Eq (5) bundles bias as the leading column of W_in[1;u]; my script splits it into a separate +b term — algebraically identical, just different parameter layout.
  • warm_u0 = trained_states[:, end] is the post-modifier state if any state_modifiers are wired up. None here, so it equals the raw reservoir terminal. If a real warmup API were to expose this, "warm from the raw reservoir terminal" and "warm from the modified terminal" would need to be a deliberate choice.

If you read this as a real footgun, the lightest-touch fix I can see is a warmup_data::Union{Nothing, AbstractMatrix} = nothing kwarg on the SciMLProblemReservoir paths of predict, where a non-nothing value triggers collectstates(rc, warmup_data, ps, st) internally and seeds u0 from the last column. No behaviour change for users who don't pass it. Happy to follow your lead on whether this lands in #450 or in PR3's wrapper.

On the naming

ContinuousESN works for me. I'll rename the stub on add-ctesn → new branch + new file + re-titled PR. The Anantharaman parametric-surrogate flavour can be its own type in its own PR if anyone needs it.

On validation via Grassberger-Procaccia

Will pull in FractalDimensions.jl (v1.9.6 as of writing). Plan is grassberger_proccacia_dim(StateSpaceSet(...)) on both the true Lorenz attractor and the warmed-rollout attractor, then assert the predicted dimension matches reference (Lorenz '63 correlation dimension is reported at ≈ 2.05 in the literature) to some agreed tolerance, alongside the attractor-reconstruction eye-test from the README. I won't quote actual fractal-dim numbers from my run until I've actually computed them — flagging the plan rather than the result.

Saswatsusmoy added a commit to Saswatsusmoy/ReservoirComputing.jl that referenced this pull request Jun 17, 2026
Francesco's call on SciML#456: drop the CTESN name (which referred to the
Anantharaman 2021 parametric-surrogate paper) and ship instead a thin
wrapper around the Lukoševičius 2012 §3.2.6 eq (5) leaky-integrator
continuous ESN. Stub file moves from `src/extensions/` to `src/models/`
to sit alongside ESN as a semantic sibling; docstring rewritten to
reflect the new scope (leaky-integrator default, parametric-surrogate
CTESN explicitly out of scope).
@Saswatsusmoy Saswatsusmoy changed the title [WIP/Draft] feat(models): CTESN — PR3 design discussion (stacked on #450) [WIP/Draft] feat(models): ContinuousESN — PR3 (stacked on #450) Jun 17, 2026
Skeleton-only placeholder so the draft PR exists as a discussion
vehicle. No include, no export, no tests — the body errors with a
clear message directing the user to the (not-yet-implemented) extension
constructor. Open design questions are tracked in the PR description.

Stacked on PR SciML#450; will be rebased onto master once that merges.
Francesco's call on SciML#456: drop the CTESN name (which referred to the
Anantharaman 2021 parametric-surrogate paper) and ship instead a thin
wrapper around the Lukoševičius 2012 §3.2.6 eq (5) leaky-integrator
continuous ESN. Stub file moves from `src/extensions/` to `src/models/`
to sit alongside ESN as a semantic sibling; docstring rewritten to
reflect the new scope (leaky-integrator default, parametric-surrogate
CTESN explicitly out of scope).
Prior commit captured only the file rename — the docstring content
update was on disk but not staged. This commit folds in the actual
docstring rewrite: references Lukoševičius 2012 §3.2.6 eq (5) as the
canonical ODE, explains how the discrete leak rate α maps to the
continuous Euler step, and explicitly distinguishes from the
Anantharaman 2021 parametric-surrogate CTESN.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants