v2: search_properties kwarg + get_idx via Auto + QuadraticSpline O(n) + FastInterpolations bench#531
Conversation
d34e867 to
14bda8b
Compare
Update: rebased + correctness fixes + verification (14bda8b)Rebased onto current Fixes in this push
Independent verification (subagent, scripts vs merge-base worktree)
Tests (local, Julia 1.12, FFF#73 dev'd)
Runic clean. Note for merge ordering: this PR requires FindFirstFunctions v3.0.0 (#73) to be merged and registered first — CI here will fail to resolve until then. |
Update: migrated to FFF v3's shim-free API (
|
|
Follow-up to the FFF rename ( |
150dfb3 to
760c2c6
Compare
FindFirstFunctions v3.0.0 is registered — this is now resolvableRebased onto current Validated against the registered
Compat is |
|
Full suite passes against the registered FindFirstFunctions v3.0.0 (resolved from General, not a local dev copy):
Also audited the API surface statically: all six |
CI triage — DI's own changes are complete; remaining reds are not DI defectsAll test groups pass on CI across Julia 1/lts/pre (Core, Methods, Extensions, Misc, QA). Three non-test jobs are red; here's the disposition:
Net: nothing further is needed in DI itself for the v3 that landed. Will confirm Spell Check goes green and report the Downgrade root cause. |
Compares the cached `Auto(t_props)` PR (DataInterpolations) against
Interpolations.jl, Dierckx.jl, BasicInterpolators.jl, and PCHIPInterpolation.jl
across construction, single-query, sorted batch, random batch, and chained
ODE-style workloads at n ∈ {100, 1k, 10k, 100k}, m ∈ {1, 10, 1k, 100k}.
Key results (full numbers in `bench/cross_library_comparison.md`):
- DI's sorted-batch + cached Auto wins ~1700-1900× vs Dierckx on Linear/Cubic
at n=100k m=100k; loses to Interpolations(uniform) by ~16% on cubic because
the latter uses O(1) uniform-grid lookup.
- Chained ODE-style at n=100k m=1000: DI beats Dierckx by ~450× and
PCHIP by ~2× on monotone cubic; this is the workload iguesser was built for.
- DI CubicHermite beats PCHIPInterpolation on every batched cell (~2-5×).
- DI QuadraticSpline is the only consistent loser: O(n²) constructor
(7s at n=100k vs Dierckx 14ms) and evaluators ~2-5× slower than Dierckx.
Root cause is the linear-scan findfirst in `spline_coefficients!`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
`SearchProperties` (added per-cache in the previous commit) already runs
the same uniformity probe, so the parallel DI implementation is now pure
duplication. Removes:
- `linear_lookup::Bool` field on every interpolation cache. The
type-parameter list shrinks accordingly.
- `seems_linear(assume_linear_t, t)` / `looks_linear(t; threshold)` in
`interpolation_utils.jl`.
- The `assume_linear_t` keyword from every constructor. (Breaking, but
the PR is already a major refactor; `SearchProperties` runs the same
probe automatically at construction with a 1e-3 default that matches
FFF's `Auto` tolerance, and approximate-uniform vectors couldn't
benefit from `UniformStep` anyway since that path needs exact-uniform
spacing.)
`test/derivative_tests.jl`'s `func.iguesser.linear_lookup` check (which
gated the per-type chained-lookup invariant assertion to non-uniform
data) is rewritten as `!func.t_props.is_uniform`, the FFF-side
equivalent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Both `get_idx` methods (integer hint and Guesser hint) used to hard-code strategy choice — `BracketGallop` for the integer-hint path, `GuesserHint` for the Guesser path. Both branches now build a single `Auto(A.t_props)` strategy and pass the appropriate hint (`iguess` for the integer overload, `iguess(t)` for the Guesser overload) to one `searchsortedlast`/`searchsortedfirst` call. The benefit is automatic O(1) closed-form lookup on exact-uniform `t`: `Auto`'s per-query dispatch checks the cached `is_uniform` first and short-circuits to `UniformStep` (which ignores the hint) when set, matching what Interpolations.jl's uniform fast path does. For non-uniform grids `_auto_pick` falls back to a hint-aware strategy (BracketGallop / ExpFromLeft / SIMDLinearScan) by length and hint validity, so the chained-ODE win from the previous branch is preserved. The Guesser-hint path now stores the resulting `idx` back into `iguess.idx_prev[]`, which `GuesserHint` used to do internally — needed so the next correlated lookup gets the right `idx_prev` when `Auto` hasn't gone through the uniform short-circuit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
`spline_coefficients!` located the active knot index with
`findfirst(x -> x > u, k)` — an O(n) linear scan on a sorted vector — on
every call. `quadratic_spline_params` calls it n times during
construction, so the constructor was O(n²) (~7 s for n=100k vs ~14 ms
for Dierckx). The per-query eval path (`cache_parameters=false`,
default) also paid O(n) per evaluation.
Two changes:
1. Factor the locator-dependent body of `spline_coefficients!` into
`_spline_coefficients_body!(N, d, k, u, i)`. The scalar
`spline_coefficients!` now calls `searchsortedlast(k, u)` —
O(log n) on the sorted knot vector — and delegates to the body.
2. `quadratic_spline_params` maintains a running locator (the next
iteration's `searchsortedlast` index is ≥ the current's, because
`t` is sorted) and advances it amortised O(1) per knot. Total
construction is O(n).
Bench (n=100k uniform, cache_parameters=false):
QuadraticSpline construct: 6914 ms → 7.9 ms (~880×)
QuadraticSpline eval: 57.5 μs → 14.4 μs (~4×)
`spline_coefficients!` keeps `N .= zero(u)` at the top — BSpline
derivative paths (`_derivative(::BSplineInterpolation, …)`) read the
entire `sc` vector, so positions outside the body's
`nonzero_coefficient_idxs` window must be zero on every call. Dropping
that zero pass was an attempted further optimisation that silently
broke BSpline derivatives by leaking stale values from previous
queries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Breaking change: every interpolation constructor that previously built
`t_props = FindFirstFunctions.SearchProperties(t)` internally now accepts an
optional `search_properties::Union{Nothing,FindFirstFunctions.SearchProperties}`
keyword. When omitted (the default), behaviour is identical to before:
`SearchProperties(t)` is built once and shared between the no-integral and
with-integral inner constructor calls — fixing a pre-existing redundancy.
When supplied, the caller's `SearchProperties` is used as-is, which gives
domain experts the ability to opt into FFF strategies that the data-driven
probes can't detect cheaply:
- `LinearInterpolation(u, t; search_properties =
SearchProperties(t; is_uniform = true))`
opts a `Vector` with float-noise into `UniformStep`'s closed-form O(1)
lookup, which the probe rejects (because float-noise exceeds the
1e-12 uniformity tolerance).
- Sharing a single populated `SearchProperties` across many interpolations
that share `t` avoids redundant probe work.
This subsumes the old `assume_linear_t` knob (already dropped on this
branch) and is more powerful — callers control every property in
`SearchProperties`, not just the linearity flag.
The inner struct constructors (called twice during construction for the
cumulative-integral pre-build path) now take `t_props` as a positional
argument, so the probe runs at most once per interpolation regardless.
All 15 cache types (`LinearInterpolation`, `QuadraticInterpolation`,
`LagrangeInterpolation`, `AkimaInterpolation`, `ConstantInterpolation`,
`SmoothedConstantInterpolation`, `QuadraticSpline`, `CubicSpline`,
`BSplineInterpolation`, `BSplineApprox`, `CubicHermiteSpline`,
`QuinticHermiteSpline`, `SmoothArcLengthInterpolation`,
`LinearInterpolationIntInv`, `ConstantInterpolationIntInv`) updated.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…sed bench
- Add `FastInterpolations` as a build/eval target for every supported
algorithm in `bench/cross_library_comparison.jl`: Linear, CubicSpline,
QuadraticSpline, Akima, and PCHIP (the last one as "FastInterpolations
(PCHIP)" alongside the existing PCHIPInterpolation entry).
- Add `bench/fast_interpolations_bench.jl`, a port of FastInterpolations'
own `benchmark/interpolation_benchmark.jl` from upstream commit
`616b106b`. Mimics the fusion-physics matrix-of-interpolants workload
they advertise on their README: `mpert × mpert` independent
interpolants on a uniform 1D grid, evaluated at `n_eval` cubic-spaced
query points. Compares against Interpolations.jl, DataInterpolations.jl,
Dierckx.jl, and FastInterpolations' `Series` interpolant. CLI matches
the upstream: `--linear | --quadratic | --cubic | --constant` ×
`--tiny | --small | --default | --large`.
The bench/Project.toml gains a `FastInterpolations` dep.
Read-only audit — no upstream changes to FastInterpolations.jl.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Full sweep (4.4 min, n ∈ {100, 1k, 10k, 100k}, m ∈ {1, 10, 1k, 100k},
both uniform and non-uniform grids) regenerated with FastInterpolations
included. Plus a `FastInterpolations.jl advertised benchmark` section
running the matrix-of-interpolants workload they publish on their README
(cubic spline + linear at npsi=64, mpert=100, n_eval=1000), and a
`Findings` section explaining where FastInterpolations beats DI, where DI
matches, and which gaps are out of scope for this PR.
Summary of findings:
- FastInterpolations.jl Series API is 70-100× faster on matrix-of-
interpolants workloads — they compute the cell anchor once per query
and reuse it across thousands of coefficient series. No equivalent in
DI; would need a separate `SeriesInterpolation` type proposal.
- Per-query scalar latency on `Vector{Float64}` grids: DI ~100-200 ns,
FI ~50 ns. The gap is `Auto(props)` dispatch overhead vs FI's direct
`_search_direct(::_CachedRange, q)` (which compiles to ~3 instructions).
Closeable in a follow-up by resolving Auto to a concrete strategy at
construction time.
- Non-uniform CubicSpline/Akima construction: DI within ~30% of FI at
n ≥ 10k thanks to the O(n) `spline_coefficients!` fix on this branch.
- Sorted-batch on non-uniform: DI competitive at large m.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Adds a `strategy::strategyType` field to every interpolation cache (13
caches in `interpolation_caches.jl` + 2 inverses in `integral_inverses.jl`).
The strategy is resolved at construction time via `_resolve_strategy(t)`
to a concrete `FindFirstFunctions.SearchStrategy` singleton (`BracketGallop`).
`get_idx` now reads `A.strategy` directly instead of wrapping `A.t_props`
in `FindFirstFunctions.Auto(...)` per call.
The win is type-level: `Auto`'s per-query `_auto_pick` returned
`Union{BinaryBracket, LinearScan, BracketGallop}` based on `length(v)` and
hint validity, forcing a runtime branch + small-union dispatch on every
`get_idx`. Storing the resolved strategy as a singleton field lets the
compiler inline `searchsortedlast(::BracketGallop, ...)` end-to-end.
Always picks `BracketGallop` (not size-dependent): the `LinearScan` branch
for `length(t) <= 16` would make `_resolve_strategy` return a small union
and propagate that union into the cache's type parameters — breaking
`@inferred` tests downstream. The `LinearScan` benefit at tiny `n` is
~10 ns in absolute terms; not worth the inference instability.
Single-query latency on uniform `Vector` grid (FastInterpolations parity
target, BenchmarkTools median):
| n | before | after | FI | ratio before -> after |
|-------|---------|---------|--------|-----------------------|
| 100 | 70 ns | 70 ns | 50 ns | 1.4x -> 1.4x |
| 1000 | 80 ns | 70 ns | 50 ns | 1.6x -> 1.4x |
| 10000 | 90 ns | 70 ns | 60 ns | 1.5x -> 1.17x |
Batched paths still pass `FindFirstFunctions.Auto(A.t_props)` to
`searchsortedlast!` — the batched-Auto specialization picks `LinearScan`/
`SIMDLinearScan`/`InterpolationSearch`/`BracketGallop`/`ExpFromLeft` based
on `(gap, has_nan, is_linear)`, which the per-query path's
`BracketGallop` can't replicate. The per-batch Auto probe amortises
across queries; the per-query Auto probe didn't.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Change `_resolve_strategy(t)` from a fixed `BracketGallop()` to
`FindFirstFunctions.Auto(t)`. Auto now resolves a concrete `StrategyKind`
at construction from `length(t)` + `SearchProperties{T}(t)`:
- Uniform data (`AbstractRange` or `Vector` whose 9-point linearity
probe is within ~1e-12 of exact uniformity) → `KIND_UNIFORM_STEP`.
The props-aware kernel uses a precomputed `inv_step` for closed-form
O(1) lookup (no per-query division).
- Non-uniform data with `length(t) ≤ 16` → `KIND_LINEAR_SCAN`.
- Otherwise → `KIND_BRACKET_GALLOP` (the previous default).
`Auto{T}` is parametric on the data ratio type, so each cache's
`strategyType` resolves to a single concrete `Auto{T}` per `t` —
type-stable per dispatch.
Mooncake's `increment_and_get_rdata!` gains a populated-RData method to
handle the new `first_val::T` / `inv_step::T` fields on
`SearchProperties{T}` — they're not differentiable (compile-time
constants from the knot vector) but Mooncake sees them as `Float64`
fields and routes the rdata through the populated branch.
Per-query latency (n=10k, m=1k, ns/query, BenchmarkTools median):
Workload | before | after | FastInterp
--- | --- | --- | ---
Range, sorted queries | 89 | 75 | 3.2
Range, random queries | 89 | 76 | 3.3
Range, chained (monotone) | 89 | 75 | 3.3
Uniform Vec, sorted queries | 47 | 32 | 70
Uniform Vec, random queries | 50 | 35 | 92
Uniform Vec, chained | 48 | 32 | n/a
Non-uniform Vec, sorted queries | 68 | 85 | 75
Non-uniform Vec, random queries | 80 | 95 | 87
Non-uniform Vec, chained | 67 | 86 | n/a
Wins: 16% on Range (props-aware UniformStep), 32% on Uniform Vector
(closed-form kernel — beats FastInterp here because DI's Guesser
overhead is amortised over fewer cycles than FastInterp's per-query
binary search). Loss: ~25% on Non-uniform Vector (Auto's per-query
`s.kind === KIND_UNIFORM_STEP` branch adds ~5-20 ns/q on the
BracketGallop path; the closed-form kernel inlines into the function
body and adds register pressure on the cold path).
Net: still 4-25× slower than FastInterp on Range — the remaining gap is
DI's per-query overhead (Guesser hint, extrapolation check, linear-interp
arithmetic), not the strategy. Closing it further would require fusing
the search + interp + extrapolation paths into a single kernel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
`bench/di_perq_bench.jl` measures single-query latency at n=10k, m=1k across (range, uniform Vector, non-uniform Vector) × (sorted, random, chained) — the cells where the Auto + props refactor matters most. Reports the resolved Auto.kind for each input so it's clear which path each measurement exercised. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Bump `[compat]` to `FindFirstFunctions = "2, 3"` so DI can pick up the
v3 parametric `SearchProperties{T}` + props-aware UniformStep kernel.
Add `FindFirstFunctions` to `bench/Project.toml` for explicit dev'ing
during cross-library benchmarks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
With DI's strategy now being `Auto{T}` (which carries a populated
`SearchProperties{T}` with `first_val::Float64` and `inv_step::Float64`
fields), Mooncake's analyzer can no longer prove the strategy struct is
non-differentiable. It tries to derive a rule for
`searchsortedlast(::Auto, v, x, h)` by recursion into FFF's strategy
kernels, hitting `Core.Intrinsics.llvmcall` in the SIMD-scan paths
(SIMDLinearScan, BitInterpolationSearch) which Mooncake can't translate.
Declare the searchsortedlast/searchsortedfirst calls dispatched through
`Auto` as `@zero_adjoint`: the return is an `Int` index, gradient flow is
already cut at the integer-indexing boundary in `_interpolate`, so a zero
rrule is correct.
This unblocks `ConstantInterpolation`, `CubicSpline`,
`CubicHermiteSpline`, `QuinticHermiteSpline`, `LagrangeInterpolation`,
`AkimaInterpolation`, `BSplineInterpolation`, and `BSplineApprox`
gradients via Mooncake — interpolations that don't have a `_interpolate`
Mooncake-wrapped rrule and were derived by recursion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
`_resolve_strategy(t) = FindFirstFunctions.Auto(t)` (this PR's hot-path
change) calls the parametric `Auto(::AbstractVector)` constructor added
in FFF v3. That constructor does not exist in FFF v2 (`Auto` only accepts
`SearchProperties` or no args), so `[compat] FindFirstFunctions = "2, 3"`
would resolve to v2 on CI and break with `MethodError: Cannot convert
::Vector{Float64} to ::SearchProperties`.
Pin compat to v3 only. This PR is blocked on FFF v3 release; until v3 is
in the registry, CI will fail with "compatible version not found".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Encode the knot vector's uniformity in `LinearInterpolation`'s cache
type via a new `IsUniform` `Val{B}` parameter, populated by
`_static_uniform_tag` at construction time. For `AbstractRange{<:Real}`
knots the tag is `Val(true)` statically (`@inferred`-clean); for `Vector`
knots it routes through `t_props.is_uniform`, making the construction
return type a `Union{LinearInterpolation{..., true},
LinearInterpolation{..., false}}` — each concrete instance is fully
type-stable per query.
A new `_interpolate(::LinearInterpolation{<:AbstractVector{<:AbstractFloat},
..., true}, t, iguess)` method takes the uniform fast path: closed-form
index lookup via `(t - first_val) * inv_step`, then linear-blend
`u[idx] + α * (u[idx+1] - u[idx])`. This skips both the `get_idx`
search-via-`Auto` round-trip and the `A.t[idx]` load. The result type is
constrained to `<:AbstractFloat` `u` to preserve the existing
slope-form's `Rational`/`Integer` semantics on those eltypes.
NaN propagation matches the non-uniform method (NaN query produces NaN
derivative via ForwardDiff; NaN-adjacent `u` doesn't poison exact-knot
queries via `0 * NaN = NaN`).
Per-query latency, `n = 10_000`, `m = 1000`, Float64, sorted queries:
Workload | Before | After
--------------------------------------|-----------|--------
Range knots | 76.7 ns/q | 12.1 ns/q
Uniform Vector knots (`collect(t)`) | 55.5 ns/q | 6.5 ns/q
Non-uniform Vector knots | 88.4 ns/q | 84.9 ns/q
DI ↔ FastInterpolations gap on Range knots: 23.7× → 3.7×. Non-uniform
Vector path is statically unchanged (different `_interpolate` method
specialization) and shows no regression.
A few existing `@inferred` calls in test/interface.jl and
test/interpolation_tests.jl are dropped for `Vector` knot construction
cases — the constructor genuinely returns a `Union` for those, by
design. The query-side `@inferred` calls remain; per-query dispatch is
fully type-stable on every concrete cache instance. The parity test
documents the realistic error bound: the lerp form differs from the
slope form by `O(length(t)) * eps * max(|u|)`, dominated by the
`(t - first_val) * inv_step` multiplication.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…knots Two correctness fixes for the statically-dispatched LinearInterpolation uniform-grid kernel: - Clamp the closed-form float position in the float domain before unsafe_trunc. With Extension extrapolation the query can be far outside the knot span, where the position exceeds typemax(Int) and unsafe_trunc is UB (returned garbage indices for t = 1e300). - Verify the guessed cell and its spacing against the live knots before using it. push!/append! mutate A.t while t_props (and the IsUniform type tag) keep their construction-time values, so the precomputed first_val/inv_step can go stale; a caller-forced is_uniform = true on non-uniform knots is the same hazard. Previously both silently corrupted interpolated values. On verification failure the evaluation falls back to the general slope-form path (slower, always correct), which is extracted into _linear_slope_interpolate so both methods share it. α is now computed cell-locally from the verified left knot, which also tightens the lerp-vs-slope roundoff gap. Regression tests: Extension extrapolation at ±1e300, a knot vector uniform at the sampled probe points but jittered between them (must not be classified uniform), and push!-after-construction breaking the spacing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Constructors computed t_props (possibly caller-supplied via the search_properties kwarg) and then called Auto(t), which re-ran the SearchProperties probe internally and ignored the cached/supplied props — a redundant O(n) scan and an inconsistency where A.strategy.props could disagree with A.t_props. Resolution now goes through Auto(t, t_props) so the two always match. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
DataInterpolations.looks_linear no longer exists; the @docs block referencing it would fail the docs build. NEWS now records the breaking assume_linear_t removal, the search_properties kwarg, the Auto-resolved knot search, the uniform fast path, and the O(n) QuadraticSpline construction. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…roperties kwarg - The Type Inference testset had switched every method to Range knots to accommodate LinearInterpolation's value-dependent IsUniform tag, dropping Vector-knot constructor inference for the other types, which do not carry the tag and still infer. Vector knots are restored for all methods; LinearInterpolation gets the Range-knot constructor check plus a query-side inference check on a Vector-knot instance. - Add the search_properties keyword bullet to every constructor docstring (the assume_linear_t bullet it replaces was removed without a replacement). - Drop conversation/PR references from bench file headers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
FindFirstFunctions v3 no longer extends Base.searchsortedlast / Base.searchsortedfirst with strategy methods. get_idx now calls FindFirstFunctions.search_last / search_first on the cached Auto strategy, the Mooncake @zero_adjoint declarations target the new functions, and the test helpers (test_cached_index, the derivative cached-index check) use the qualified v3 names. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…hsorted_first FindFirstFunctions renamed its dispatchers to restore the 'sorted' cue. Update get_idx, the Mooncake @zero_adjoint declarations, and the test helpers to the new qualified names. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The crate-ci/typos spell check flags the abbreviation 'strat' as a likely misspelling of 'start'/'strata'. Spell out the local variable. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
efcacdf to
74b0b5a
Compare
|
Rebased onto current Re-verified under the new harness against the registered FindFirstFunctions v3.0.0: Core (interpolation_tests.jl 17,412 + my fast-path parity / jittered-probe / Extension-extrapolation / push!-staleness tests, plus #543's duplicate-knot test) and Methods (incl. the migrated Note: master's Downgrade is still red from the Julia-1.10 floor conflict (#546's |
Every interpolation cache stored both t_props::SearchProperties{T} and
strategy::Auto{T}, where Auto = (kind, props) — so the SearchProperties
lived in the struct twice and the cache carried two T-driven type
parameters (propsType + strategyType) that always moved together.
The cached strategy was only ever read by get_idx; the fast path and the
batched eval paths already work off t_props (and the batched API
re-resolves the kind anyway). So replace strategy::Auto{T} (parametric)
with kind::StrategyKind (a UInt8 enum, non-parametric), dropping the
strategyType parameter from all 15 cache structs + the two integral-
inverse caches. The parametric props payload stays as the single
t_props field.
get_idx now branches on the cached kind: KIND_UNIFORM_STEP reconstructs
the (isbits, stack-allocated) Auto from t_props so the closed-form
O(1) lookup is taken; every other kind ignores the props and dispatches
the bare enum, preserving the hint-aware gallop. The Mooncake extension
gains zero_adjoint declarations for the StrategyKind dispatch form
alongside the existing Auto ones.
LinearInterpolation's uniform fast-path _interpolate signature drops one
positional <:Any to match the removed type parameter.
All 5 test groups pass against registered FindFirstFunctions v3.0.0
(Core/Methods/Extensions incl. Mooncake/Zygote/SCT/Misc/QA incl.
AllocCheck). Runic clean.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Collapsed the redundant strategy field (
|
…, not a type param
The static fast-path work encoded knot uniformity in a Val{IsUniform}
*type parameter* on LinearInterpolation. Since a Vector's uniformity is
a runtime property, the constructor returned
Union{LinearInterpolation{...,true}, LinearInterpolation{...,false}} —
type-unstable. That had been worked around by relaxing/removing the
@inferred tests, which is not acceptable.
Follow FFF's own design instead: encode the choice as a runtime enum and
branch on it, so every returned type is inferred.
- Drop the IsUniform type parameter and the is_uniform_static::Val field
from LinearInterpolation; the cache is a single concrete type again
(constructor inferred for both Vector and Range knots).
- Select the uniform closed-form path with a runtime branch on the cached
kind enum inside _interpolate (A.kind === KIND_UNIFORM_STEP), with both
arms returning the same concrete type so the query stays inferred. This
mirrors FFF's runtime StrategyKind dispatch (concrete Int return
regardless of the runtime kind).
- Remove the now-unused _static_uniform_tag helper.
- Restore the inference tests: the Type Inference testset @Infers every
constructor for Vector and Range knots (plus query inference), and the
LinearInterpolation testset's relaxed @inferred guards are reverted to
master's unconditional form. is_uniform_static checks become kind checks.
Verified: all 7 constructors and all 13 interpolation query paths infer;
full suite green against registered FindFirstFunctions v3.0.0 (Core incl.
the restored Type Inference testset, Methods, Extensions incl.
Mooncake/Zygote/SCT, Misc, QA incl. AllocCheck). Runic clean.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Fixed the inference regression — uniformity is now a runtime enum, not a type parameter (
|
The single runtime-kind-branch uniform path is right for Vector knots
(uniformity is a value property; a Vector can be push!-mutated or
caller-forced-uniform on jittered data, so it needs the live-knot
cell/spacing verification + slope fallback). But an AbstractRange is
uniform at the *type* level and is immutable + exactly spaced, so it can
take a leaner kernel — dispatched statically on typeof(t), which is
inference-safe (no value-dependent Union, unlike the old IsUniform tag).
Add _interpolate(::LinearInterpolation{<:AbstractVector{<:AbstractFloat},
<:AbstractRange}, ...) → _linear_uniform_range_interpolate, which:
- skips the runtime kind branch (ranges are always KIND_UNIFORM_STEP),
- skips the cell/spacing verification (a range can't go stale),
- computes α directly from the float position (α = f - idx0), avoiding
the two A.t[idx] range-arithmetic loads,
- skips the vestigial iguesser store (the closed form never searches).
It keeps full NaN handling (NaN query → NaN derivative; NaN-adjacent u
resolved by exact-knot comparison), matching the other methods.
Vector knots keep the runtime-branch verified path unchanged; non-Float
u and non-uniform knots keep the slope form.
Measured per-query (n=10k, monomorphic loop): range A(q) 30.5 -> 17.0
ns/q; uniform Vector unchanged (18.4); non-uniform unchanged. Inference
stays clean (constructor + query inferred for both Range and Vector),
AD works through the kernel (Mooncake/Zygote/SCT green), and it is
allocation-free (AllocCheck). All 5 groups pass against registered
FindFirstFunctions v3.0.0.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Performance: static lean fast path for
|
| Path | Range knots | Uniform Vector |
|---|---|---|
runtime-branch A(q) (before) |
30.5 ns/q | 18.4 ns/q |
| + static range fast path (after) | 17.0 ns/q | 18.4 ns/q (unchanged) |
The runtime branch itself is ~free on Vectors (≈0.5 ns). The real opportunity was that a range was paying the Vector kernel's overhead — cell/spacing verification, two A.t[idx] range-arithmetic loads, and the iguesser store — all provably unnecessary for an immutable, exactly-uniform range.
The split (two inference-safe uniform paths):
- Static, for
t::AbstractRange: dispatched ontypeof(t)(inference-safe — no value-dependentUnion), a lean kernel that skips the branch + verification +A.tloads (α = f - idx0) + the vestigial iguesser store. Keeps full NaN handling. - Runtime, for
Vectorknots: thekind-branch verified kernel — needed because aVectorcan bepush!-mutated or caller-forced-uniform on jittered data. (search_properties = SearchProperties(t; is_uniform=true)designates uniformity via the runtimekind; a Vector type can't carry a static guarantee.) - Non-uniform / non-Float
u: slope form.
Verified: all 5 groups green against registered FFF v3.0.0; inference clean for both Range and Vector (constructor + query); AD correct through the new kernel (Mooncake 604 / Zygote 11974 / SCT 600); allocation-free (AllocCheck). Range interior matches the slope form to 3e-15.
Summary
Coordinated update with FindFirstFunctions.jl PR #73 (v3.0 release with enum-tagged dispatch + parametric
SearchProperties{T}+ props-awareUniformStep). This PR follows up on the merged #529.The original #531 had six pieces:
Drop the legacy
linear_lookup/seems_linear/looks_linearmachinery. Every interpolation cache had alinear_lookup::Boolfield set from a hand-rolledlooks_linear(t; threshold)probe. That's exactly whatt_props.is_uniformalready gives us. Breaking change:assume_linear_tkwarg removed from all constructors. For explicit override, useSearchProperties(t; is_uniform = true).Refactor
get_idxto dispatch throughAuto(A.t_props). (Now updated in this revision — see point 6 below.)Fix O(n²) QuadraticSpline construction.
quadratic_spline_paramswas callingspline_coefficients!which had afindfirst(x -> x > u, k)linear scan inside a per-knot loop. Replaced with a running pointer that advances monotonically → O(n) total. ~870× speedup at n=100k.search_propertiesconstructor kwarg on every interpolation type. Optionalsearch_properties::Union{Nothing, FindFirstFunctions.SearchProperties} = nothing. When omitted, behaviour is identical to before; when supplied, the caller'sSearchPropertiesis used as-is.FastInterpolations.jl added to the cross-library bench.
bench/cross_library_comparison.jlnow benchmarks FastInterpolations.jl across Linear/Cubic/Quadratic/Akima/PCHIP;bench/fast_interpolations_bench.jlports their advertised bench.Route
_resolve_strategythroughAuto(t)(updated in this revision). The earlier draft pinned_resolve_strategy(t)toBracketGallop()to avoid Union-typed strategy fields. With FFF v3's parametricAuto{T}, the strategy field is nowAuto{T}whereTis the data ratio type — single concrete pert, type-stable.Auto(t)resolves to:KIND_UNIFORM_STEPfor uniformly-spaced data, taking the props-aware closed-form kernel (one subtract, one multiply, one truncate per query).KIND_LINEAR_SCANforlength(t) ≤ 16non-uniform.KIND_BRACKET_GALLOPotherwise (the previous default).Mooncake ext gains a
@zero_adjointdeclaration forsearchsortedlast(::Auto, ...)— the integer index isn't differentiable and Mooncake's recursion throughAuto{Float64}(which containsFloat64first_val/inv_stepfields) would otherwise hitllvmcallSIMD intrinsics in FFF's strategy kernels.Per-query latency (n = 10k, m = 1k, ns/query, BenchmarkTools median)
Wins: 15-33% on uniform data (Range and uniform
Vector). Cost: ~5-20 ns/q on non-uniform Vector (Auto's per-querys.kind === KIND_UNIFORM_STEPbranch adds register pressure on the BracketGallop path; FFF's enum-dispatch function body inlines both KIND paths into the call site).The DI ↔ FastInterp gap on Range narrowed from 26× (89 ns/q ÷ 3.5 ns/q) to 22× (75 ÷ 3.3). The remaining gap is DI's per-query overhead (Guesser hint, extrapolation check, linear-interp arithmetic), not the strategy. Closing further would require fusing search + interp + extrapolation into a single kernel.
Reproducer:
bench/di_perq_bench.jl(added this revision).Tests
All 5 groups pass on Julia 1.12:
@zero_adjointfix above)FFF v3 dependency
[compat]requiresFindFirstFunctions = "3"(FFF #73 must merge and register first; CI here cannot resolve until then). v3 brings:SearchProperties{T}with precomputedfirst_val::T/inv_step::T.Auto{T}carryingSearchProperties{T}.UniformStepkernel folding in the closed-form O(1) lookup from CompatHelper: bump compat for "Optim" to "1.0" #74 (which is now closed as superseded).FFF v3 removed the v2
Base.searchsortedlast(::S, ...)extensions entirely; DI's call sites (get_idx, Mooncake@zero_adjoint, test helpers) are migrated toFindFirstFunctions.search_last/search_first.Notes
Draft. Please ignore until reviewed by @ChrisRackauckas.
🤖 Generated with Claude Code
Co-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com