Add ReservoirComputingBenchmarks library#414
Conversation
Initial implementation of the benchmarking library in lib/ as discussed in SciML#398. Implements three core tasks: - Memory Capacity (Jaeger 2002) - NARMA target generation and evaluation (Atiya & Parlos 2000) - Information Processing Capacity (Dambre et al. 2012) Model-agnostic API taking raw input/states arrays. Uses Cholesky-based ridge regression with factorization reuse for performance. Zero external dependencies (LinearAlgebra + Statistics only).
There was a problem hiding this comment.
Pull request overview
Introduces a new standalone Julia library under lib/ReservoirComputingBenchmarks implementing reservoir computing benchmark tasks (Memory Capacity, NARMA, and Information Processing Capacity) with a model-agnostic (input, states) API and a small stdlib-only implementation.
Changes:
- Adds benchmark implementations:
memory_capacity,generate_narma/narma, andipc(Legendre-basis IPC). - Adds shared utilities (ridge regression helpers, normalization, metrics, train/test split).
- Adds a comprehensive unit test suite and package metadata (
Project.toml).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/ReservoirComputingBenchmarks/Project.toml | Declares new package, stdlib deps, and test targets/compat. |
| lib/ReservoirComputingBenchmarks/src/ReservoirComputingBenchmarks.jl | Defines module, imports stdlib deps, includes source files, exports API. |
| lib/ReservoirComputingBenchmarks/src/utils.jl | Implements ridge regression factor/solve, correlation/NMSE, splitting, normalization. |
| lib/ReservoirComputingBenchmarks/src/memory_capacity.jl | Implements Jaeger (2002) memory capacity using reusable ridge factorization. |
| lib/ReservoirComputingBenchmarks/src/narma.jl | Implements NARMA target generation and evaluation with multiple metrics. |
| lib/ReservoirComputingBenchmarks/src/ipc.jl | Implements Dambre et al. (2012) IPC with Legendre basis and basis enumeration. |
| lib/ReservoirComputingBenchmarks/test/runtests.jl | Adds unit + integration-like tests for all benchmarks and utilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| G = Symmetric(X' * X) | ||
| G_reg = Matrix(G) + reg * Matrix{eltype(X)}(I, n, n) | ||
| F = cholesky(Symmetric(G_reg)) | ||
| buf = Vector{eltype(X)}(undef, n) | ||
| return _RidgeFactor(F, buf) |
There was a problem hiding this comment.
_ridge_factor mixes eltype(X) with reg::Real when forming G_reg. If reg is not the same type as eltype(X) (e.g. default reg=1.0 with Float32 states), G_reg and the Cholesky factor promote to a different element type than buf = Vector{eltype(X)}, causing a construction type mismatch for _RidgeFactor (or later mul!/solve issues). Consider promoting once (e.g. T = promote_type(eltype(X), typeof(reg))), converting X/reg accordingly, and allocating both the factor and Xty_buf with the same T.
| G = Symmetric(X' * X) | |
| G_reg = Matrix(G) + reg * Matrix{eltype(X)}(I, n, n) | |
| F = cholesky(Symmetric(G_reg)) | |
| buf = Vector{eltype(X)}(undef, n) | |
| return _RidgeFactor(F, buf) | |
| T = promote_type(eltype(X), typeof(reg)) | |
| reg_T = convert(T, reg) | |
| X_T = Matrix{T}(X) | |
| G = Symmetric(X_T' * X_T) | |
| G_reg = Matrix{T}(G) | |
| @inbounds for i in 1:n | |
| G_reg[i, i] += reg_T | |
| end | |
| F = cholesky(Symmetric(G_reg)) | |
| buf = Vector{T}(undef, n) | |
| return _RidgeFactor{T}(F, buf) |
| n = size(X, 2) | ||
| G = Symmetric(X' * X) | ||
| G_reg = Matrix(G) + reg * Matrix{eltype(X)}(I, n, n) | ||
| F = cholesky(Symmetric(G_reg)) |
There was a problem hiding this comment.
_ridge_factor uses cholesky(Symmetric(G_reg)) without handling the common case where reg == 0 (or very small) and X'X is not positive definite (e.g. rank-deficient features). This will throw a PosDefException that may surprise users of the public APIs. Consider either (a) requiring reg > 0 with a clear error, (b) adding a small jitter when reg == 0, or (c) catching PosDefException and rethrowing an ArgumentError explaining how to fix it (increase reg / provide full-rank data).
| F = cholesky(Symmetric(G_reg)) | |
| F = try | |
| cholesky(Symmetric(G_reg)) | |
| catch e::LinearAlgebra.PosDefException | |
| throw(ArgumentError("Failed to compute Cholesky factorization of X'X + reg * I: " * | |
| "the matrix is not positive definite. This can happen when the " * | |
| "features are rank-deficient or when reg is zero or too small. " * | |
| "Consider increasing reg to a positive value or providing full-rank data. " * | |
| "Original error: $(e)")) | |
| end |
| function _nmse(y_true::AbstractVector, y_pred::AbstractVector) | ||
| v = var(y_true) | ||
| if v < eps(eltype(y_true)) | ||
| @warn "NMSE: target variance is near-zero ($v). " * | ||
| "NMSE is undefined for constant targets." | ||
| return NaN | ||
| end |
There was a problem hiding this comment.
_nmse compares v against eps(eltype(y_true)). If y_true is not a floating-point vector (e.g. Vector{Int}), eps(eltype(y_true)) will throw a MethodError even though var(y_true) returns a float. Consider using eps(typeof(v)) or eps(float(eltype(y_true))) (or simply v ≤ eps(v)) so this works for any AbstractVector of reals.
| @test haskey(result, :total) | ||
| @test haskey(result, :delays) | ||
| @test length(result.delays) == 20 | ||
| @test all(0 .<= result.delays .<= 1.001) |
There was a problem hiding this comment.
This chained broadcasted comparison doesn’t test the upper bound as intended: 0 .<= result.delays .<= 1.001 is evaluated left-to-right, and the second .<= 1.001 compares a Bool array to 1.001 (always true). Use an explicit conjunction such as (0 .<= result.delays) .& (result.delays .<= 1.001) (or two separate all(...) checks).
| @test all(0 .<= result.delays .<= 1.001) | |
| @test all((0 .<= result.delays) .& (result.delays .<= 1.001)) |
| xn = ReservoirComputingBenchmarks._normalize(x, 0.0, 0.5) | ||
| @test xn[1] ≈ 0.0 | ||
| @test xn[end] ≈ 0.5 | ||
| @test all(0 .<= xn .<= 0.5) |
There was a problem hiding this comment.
Same issue as earlier: 0 .<= xn .<= 0.5 is parsed left-to-right, so the second comparison is applied to a Bool array and doesn’t enforce the upper bound. Use (0 .<= xn) .& (xn .<= 0.5) (or separate all(...) checks).
| @test all(0 .<= xn .<= 0.5) | |
| @test all((0 .<= xn) .& (xn .<= 0.5)) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa299539e1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| G_reg = Matrix(G) + reg * Matrix{eltype(X)}(I, n, n) | ||
| F = cholesky(Symmetric(G_reg)) | ||
| buf = Vector{eltype(X)}(undef, n) | ||
| return _RidgeFactor(F, buf) |
There was a problem hiding this comment.
Use promoted type for ridge scratch buffer
_ridge_factor builds G_reg with reg (default 1.0), which promotes Float32 state matrices to a Float64 Cholesky factor, but buf is still allocated as Vector{eltype(X)}. Constructing _RidgeFactor(F, buf) then fails with a type mismatch for common Float32 inputs, so memory_capacity, narma, and ipc can error out on default RC data types.
Useful? React with 👍 / 👎.
| @assert reg >= 0 "Regularization coefficient must be non-negative, got $reg" | ||
| n = size(X, 2) | ||
| G = Symmetric(X' * X) | ||
| G_reg = Matrix(G) + reg * Matrix{eltype(X)}(I, n, n) | ||
| F = cholesky(Symmetric(G_reg)) |
There was a problem hiding this comment.
Disallow zero ridge penalty with Cholesky solve
The API explicitly allows reg == 0, but the implementation always calls unpivoted cholesky(X'X + reg*I). When reg is zero and X is rank-deficient (e.g., more features than training samples, which is common in RC benchmarks), X'X is singular and this throws PosDefException instead of producing a result. Either enforcing reg > 0 or switching to a solver that handles semidefinite systems would avoid this runtime failure.
Useful? React with 👍 / 👎.
- Fix type promotion in _ridge_factor: promote eltype(X) and typeof(reg) to a common type for Cholesky factor and scratch buffer - Catch PosDefException in _ridge_factor with clear error message when reg is too small for rank-deficient data - Fix eps check in _nmse to use typeof(v) instead of eltype(y_true) so it works for non-float input vectors - Fix chained broadcast comparisons in tests that silently skipped upper bound checks
|
I have went through the code and I'll provide some detailed comments tomorrow. The implementations look correct, but there are a couple of details were the code could be made more flexible. In the meantime I am thinking that you probably were right in the issue and the benchmarks could be more linked to the models, since that would reduce the code reuse. Without rewriting the functions for the moment, what do you think the signatures of the functions would be like? |
|
For the model-integrated signatures, I was thinking we keep the existing raw-array methods as the base and add dispatch methods that accept RC.jl models directly. Something like: # existing (stays as-is, model-agnostic)
memory_capacity(input::AbstractVector, states::AbstractMatrix; max_delay=30, reg=1.0)
# new dispatch — handles input generation + state collection internally
memory_capacity(model::AbstractReservoirComputer, ps, st;
T=3000, rng=Random.default_rng(), max_delay=30, reg=1.0)The model method would generate the input signal, call collectstates(model, reshape(input, 1, :), ps, st) to get the states, then delegate to the raw-array method. Same pattern for narma and ipc: narma(model::AbstractReservoirComputer, ps, st;
T=3000, rng=Random.default_rng(), order=10, metric=:nmse, reg=1.0)
ipc(model::AbstractReservoirComputer, ps, st;
T=3000, rng=Random.default_rng(), max_delay=10, max_degree=3, reg=1.0)So the user workflow becomes just model = ESN(1, 100, 1)
ps, st = setup(rng, model) mc = memory_capacity(model, ps, st; T=3000, max_delay=50) For the dependency question I think this could work nicely as a package extension that activates when ReservoirComputing is loaded, so the base package stays zero-dependency and the RC.jl integration is opt-in. What do you think? |
|
yeha I think that that would be a nice way to approach it. I will go over the code and provide you with some comments |
|
Any updates on this? |
| score = if metric === :nmse | ||
| _nmse(y_test, y_pred) | ||
| elseif metric === :rnmse | ||
| nmse_val = _nmse(y_test, y_pred) | ||
| nmse_val < 0 ? 0.0 : sqrt(nmse_val) | ||
| elseif metric === :mse | ||
| mean((y_test .- y_pred) .^ 2) | ||
| else | ||
| throw(ArgumentError("Unknown metric :$metric. Use :nmse, :rnmse, or :mse.")) | ||
| end |
There was a problem hiding this comment.
you can avoid this by letting metric accept any function with a (test, pred) signature. This way the function can also support any metric the user wants to use
|
I was offline the past few days, I'm testing the code at the moment |
- Change metric from Symbol to callable function with signature (y_true, y_pred) -> score - Export built-in metrics nmse, rnmse, mse as public convenience functions - Remove metric field from narma() return value (no longer needed) - Update docstring with examples for built-in and custom metrics - Update tests to use function references instead of Symbols - Add test demonstrating custom metric usage Addresses Francesco's review feedback on PR SciML#414
- Change narma() metric from Symbol to callable with (y_true, y_pred) -> score - Export nmse, rnmse, mse as public convenience functions - Remove metric field from narma() return value - Replace NARMA_COEFFICIENTS Dict with compile-time resolved NamedTuples - Use mul!/copytri!/cholesky! to eliminate redundant matrix allocations in _ridge_factor - Add _ridge_solve! with pre-allocated w_buf for zero-alloc hot loop solves - Use copyto! + views instead of collect/array slicing in MC and IPC - Replace Dict with Matrix for IPC poly_cache (array indexing vs hash lookup) - Pre-allocate y_pred buffer in IPC to eliminate per-basis allocations - Fix rnmse to explicitly handle NaN from zero-variance input
ef98f3a to
fad4314
Compare
…ge extension Implements the five missing benchmarks from issue SciML#398 (nonlinear transformation, nonlinear memory, sin approximation, kernel rank, generalization rank) alongside a package extension that adds (model, ps, st) dispatch to all eight benchmarks. The extension activates when ReservoirComputing is loaded, generating synthetic uniform[-1, 1] driving signals, collecting reservoir states via collectstates, and forwarding to the array-based methods. The base library remains zero-dependency. The model dispatch validates in_dims == 1 with a friendly ArgumentError and reuses the user-provided initial state across runs for kernel and generalization rank so the streams are independent. Tests cover array form, happy path, validation errors, edge cases, reproducibility, deterministic generalization-rank invariant (perturbation == 0 -> rank 1), and cross-model dispatch on ES2N and EuSN. 180/180 pass.
|
Pushed the rest of issue #398 in 3b84876. All eight benchmarks now have an array form and a model dispatch through a package extension that activates when ReservoirComputing is loaded. The base library stays zero dep so anyone can use it without pulling in RC.jl. The extension adds (model, ps, st) dispatch on memory_capacity, nonlinear_memory, nonlinear_transformation, sin_approximation, narma, ipc, kernel_rank and generalization_rank. It generates a uniform [-1, 1] driving signal of length T, collects states via collectstates, and forwards to the array form. There is also a friendly in_dims == 1 check so users get a clear error instead of an opaque shape mismatch. Test count is 180 covering happy path, validation errors, reproducibility, a deterministic generalization rank invariant (perturbation 0 gives rank 1) and cross model dispatch on ES2N and EuSN. A few things I left out so this PR stays focused. Flagging them as follow ups so we can plan them separately.
Let me know which of these you want me to take on next and which you want to defer. |
| target = Vector{Float64}(undef, T_valid) | ||
| y_pred = Vector{Float64}(undef, n_test) |
There was a problem hiding this comment.
don't force Float64 if you can infer it from input/states
| by_degree = Dict{Int, Float64}() | ||
| by_delay = Dict{Int, Float64}() |
There was a problem hiding this comment.
don't force Float64 if you can infer it from input/states
| # Valid time range (after max_delay to avoid edge effects) | ||
| valid = (max_delay + 1):T | ||
| T_valid = length(valid) | ||
| X = Matrix{Float64}(undef, T_valid, size(states, 1)) |
There was a problem hiding this comment.
don't force Float64 if you can infer it from input/states
| @assert size(states, 2) == T "states must have $T columns (time steps), got $(size(states, 2))" | ||
| @assert max_delay >= 1 "max_delay must be >= 1, got $max_delay" | ||
| @assert max_degree >= 1 "max_degree must be >= 1, got $max_degree" | ||
| @assert max_delay < T "max_delay ($max_delay) must be less than signal length ($T)" |
There was a problem hiding this comment.
instead of assertions I would make more use of throw(ArgumentError(...)) or DimensionMismatch(...). making clear in the docs what the dims should be and then throwing the errors should be a cleaner path. I know I'm using them a bunch in the main library, but that should change too
|
I would say that the library is in a good stage and ready to be merged. I do have a couple of comments, which I submitted as a review but I see them as more general, since they appear in more functions:
|
…match
`@assert` is intended for internal invariants and can be disabled by
`--check-bounds=no`/optimization passes. User-facing input validation
should raise typed exceptions so callers can dispatch on them and so the
errors are not silently optimized away.
- Argument-value violations (out-of-range scalars, missing kwargs,
non-positive sizes) now raise `ArgumentError`.
- Shape/length mismatches now raise `DimensionMismatch`.
- Tests updated to match the new exception types.
Addresses review feedback on PR SciML#414 (use specific errors instead of
assertions for clearer user-facing failures).
…ng Float64
Allocations such as `Matrix{Float64}(undef, ...)`, `Vector{Float64}(...)`
and `zeros(...)` ignored the caller's element type and silently promoted
`Float32` (and other) inputs to `Float64`. Each benchmark now derives a
single computational type
Telt = promote_type(eltype(input), eltype(states), typeof(reg))
and uses it for every internal buffer (design matrix, target buffers,
capacity vectors, IPC dictionaries). The NARMA target generator promotes
across the input and the four recurrence coefficients instead.
Metric and utility helpers (`_squared_correlation`, `_nmse`, `rnmse`,
`_normalize`) were also updated to be type-preserving so that storing
their results back into the new typed buffers never silently downcasts.
Docstrings now document the element-type rule in place of the previous
explicit `Float64` annotation.
Addresses review feedback on PR SciML#414 (don't force `Float64`; infer from
input/states).
…metric The `metric` keyword previously carried a `::MetricFunction` annotation (an alias for `Function`). It excluded user-defined callable structs and provided no real type safety, since `Function` is already the supertype of every function. Remove the annotation so any object satisfying the `(y_true, y_pred) -> score` contract is accepted, and delete the now unused `MetricFunction` constant from the module. Addresses review feedback on PR SciML#414 (leave the `metric` argument untyped).
|
Hi @MartinuzziFrancesco — pushed three commits (
All 180 tests still pass, Runic format is clean. DisclosureFull transparency: these three commits were drafted with an AI agent harness (Claude Code) rather than written by hand. I scoped the work this way because the changes are mechanical and low-risk — pure refactors that don't alter benchmark semantics, fully covered by the existing test suite, and easy to verify by reading the diff. I reviewed each commit before pushing, but flagging it explicitly so you can apply whatever extra scrutiny you'd like. Happy to redo any piece manually if you'd prefer. |
|
looks good to me, I think we may need @ChrisRackauckas s input on how to structure the docs for v 0.1, since this is a multi library setup. It can just be api documentation for this version, but I'd prefer that we take care of it in this pr |
Summary
Initial implementation of the
ReservoirComputingBenchmarkslibrary inlib/as discussed in #398. Implements the first three core benchmark tasks:Design decisions
(input, states)arrays, no RC.jl dependencylib/: separate package under the monorepo as discussedLinearAlgebra+Statistics(stdlib)Exported API
memory_capacity(input, states; max_delay, train_ratio, reg)(total, delays)generate_narma(input; order, alpha, beta, gamma, delta, normalize)narma(input, states; order, metric, train_ratio, reg, washout)(score, metric, target)ipc(input, states; max_delay, max_degree, cross_terms, reg)(total, linear, nonlinear, by_degree, by_delay, basis_capacities, theoretical_max)Closes #398 (partially — remaining 5 tasks can be added incrementally).
Test plan
Pkg.test())