Skip to content

scan + pulse: keep Scan body source facts consistent with outer wires#2337

Closed
JulienBalianSonos wants to merge 5 commits into
mainfrom
feat/scan-pulse-body-fact-mgmt
Closed

scan + pulse: keep Scan body source facts consistent with outer wires#2337
JulienBalianSonos wants to merge 5 commits into
mainfrom
feat/scan-pulse-body-fact-mgmt

Conversation

@JulienBalianSonos
Copy link
Copy Markdown
Collaborator

@JulienBalianSonos JulienBalianSonos commented Jun 3, 2026

This solve issue of dprnn but is just a symptom fix of 'scan body should be already in sync with main graph regarding fact'

Summary

Five commits that together let pulse-mode handle Scan ops whose body source facts depend on the streaming symbol (canonical example: DPRNN-style chunked GRUs):

  1. pulse/model: extend blockify gate to Scan body with stream-derived dim. PulsedModel::new already runs the blockify symbolic-chunk substitution when the outer graph has quadratic sections; this commit extends the gate to fire whenever any Scan body has a source fact whose shape mentions the streaming symbol. Without it the warmup turn binds 0-length tensors against literal dims and bails with Clashing resolution for expression. 2=2 != 0.

  2. pulse/model: substitute chunk symbol before declutter for Scan-body path. Declutter aggressively folds stream-derived expressions to literals once the streaming symbol becomes concrete-ish (e.g. (STREAM − n_fft)/hop + 1 → Val(2)). For the Scan-body case the substitution STREAM → S · pulse has to run first, so the same expression folds to 2·S − 1 and stays symbolic.

  3. core/ops/scan: resync body source facts when outer inputs drift. This is the headline change. The Scan body is a separate TypedModel whose Source facts are set at construction time. When the outer graph runs through declutter or axis-change passes that mutate the upstream wires (canonical: an EinSum NIHW,OI->OHW projecting a literal-1 batch axis on the chain feeding the Scan), the body's source facts drift out of sync. Scan::output_facts reads from body.output_fact(...) which traces back to the body source, so the drift is silent until a runtime warmup or downstream invariant check trips on it.

    New declutter_resync_body_source_facts rule rebuilds the body via wire_node so output_facts re-propagates the new shapes. Two documented bail-outs return Ok(None):

    • No drift: nothing to do.
    • A body op rejects the new shape, or the rebuild would change a body output fact in a way TypedModelPatch::replace_single_op cannot propagate: keep the original body, log a warning.

    One hard error (bail!):

    • Any expected shape is concrete with 0 volume. This is an upstream invariant violation (pulse/MultiBroadcastTo: linearity-checked per-pulse size for stream axis #2336 fixed the canonical trigger). Rebuilding the body with a degenerate shape would auto-konst-fold the body outputs while leaving the Source input konst-less, tripping Scan::output_facts' state-equality check on the next pass. Failing here surfaces any future upstream bug at its origin instead of letting it propagate as a confusing downstream symptom.

    Covered by two new unit tests:

    • tests::test_declutter_resync_body_source_facts constructs a Scan with a multi-input body whose Scan slot drifted ((1, 2, 4) while the outer chain feeding it collapsed to (T, 1, 4)) and asserts that the rule's patch resyncs the slot to (1, 1, 4) while leaving the matching State slot untouched.
    • tests::test_declutter_resync_no_drift_no_patch locks in the early-exit when source facts already match.
  4. core/ops/source: TypedSource::change_axes returns Ok(None) for inapplicable Rm. change_axes' contract is to return Ok(None) when the change cannot be applied; TypedSource was bubbling the underlying change_shape error for Rm on a non-trivial axis, which aborts the surrounding declutter pass instead of letting ChangeAxes skip the proposal. Now maps the targeted failure mode to Ok(None).

    Covered by two new unit tests:

    • tests::change_axes_rm_non_trivial_returns_none locks in the contract for the previously-failing case.
    • tests::change_axes_rm_trivial_still_applies sanity-checks that Rm on a 1-dim still applies.
  5. core/ops/array: implement set_symbols on DynSlice and Topk. Both ops carry a TDim field that needs to ride the chunk-symbol substitution. The impl follows the existing Slice / Tile pattern: build a fresh op with substitute_all(subs)? on the TDim fields, then wire_node it into the target.

Why bundle them

Each commit on its own is a no-op for model classes already covered by the existing scan-warmup gate. The bundle unlocks a new class (Scan body whose source shape is a chunk_sym-derived expression) end to end. The chunk substitution is meaningless without the gate, the gate is meaningless without the substitution running pre-declutter, the set_symbols impls on DynSlice / Topk make the substitution carry through ops the pulse path actually hits, and the resync is the runtime-correctness half of the same story.

Test

  • All existing tract-core (249) and tract-pulse (37) tests pass on top of current main.
  • New: tests::test_scan_body_with_stream_derived_dim_uses_chunk_symbol (pulse) covers the gate + substitution.
  • New: tests::test_declutter_resync_body_source_facts and test_declutter_resync_no_drift_no_patch (core) cover the resync rule.
  • New: tests::change_axes_rm_non_trivial_returns_none and tests::change_axes_rm_trivial_still_applies (core) cover the TypedSource contract fix.
  • Manual end-to-end: dpdfnet-2 (DPRNN×2) pulse export runs at ≈4.2x real-time on a 3 s 16 kHz clip; baseline (no DPRNN) unchanged at ≈30x real-time.

Performance

A/B'd PulsedModel::new declutter time on the models we have at hand, with the resync rule enabled vs disabled:

  • baseline (no DPRNN, rule fires per Scan but always returns Ok(None) no-drift): 0.628s vs 0.638s averaged over 3 runs each. The delta sits inside run-to-run noise (±15 ms / ~2%), so the no-drift fast path costs nothing measurable on models that don't need the rule.
  • dpdfnet-2 (DPRNN×2, rule actively rebuilds bodies): cannot A/B because the model fails to declutter without the rule. Total PulsedModel::new time is ~4.1s; the rule's rebuilds are a small fraction of that.
  • Runtime per-pulse evaluation (188 × 256-sample chunks on a 3 s clip): unchanged in either case (the rule only runs at declutter time, not per pulse). Baseline ≈ 30x real-time, dpdfnet-2 ≈ 4.2x real-time.

Resync overhead is below the noise floor on the models we tested.

Supersedes the open feat/scan-warmup-symbolic-chunk branch (commit 1 here is the same gate, plus the rest).

@JulienBalianSonos JulienBalianSonos changed the title scan + pulse: keep Scan body source facts consistent with outer wires [WIP] scan + pulse: keep Scan body source facts consistent with outer wires Jun 3, 2026
@JulienBalianSonos JulienBalianSonos marked this pull request as draft June 3, 2026 09:21
@JulienBalianSonos JulienBalianSonos force-pushed the feat/scan-pulse-body-fact-mgmt branch 3 times, most recently from 5c8636b to f7fe6dd Compare June 3, 2026 13:41
@JulienBalianSonos JulienBalianSonos changed the title [WIP] scan + pulse: keep Scan body source facts consistent with outer wires scan + pulse: keep Scan body source facts consistent with outer wires Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant