pulse/ops/scan: substitute stream symbol in Scan body during pulsification#2346
Open
JulienBalianSonos wants to merge 2 commits into
Open
pulse/ops/scan: substitute stream symbol in Scan body during pulsification#2346JulienBalianSonos wants to merge 2 commits into
JulienBalianSonos wants to merge 2 commits into
Conversation
78a9882 to
6d94196
Compare
1915403 to
c5f93a5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Alternative approach to #2337, addressing kali's feedback that the body source / outer fact discrepancy should not happen in the first place rather than being patched after the fact.
Where the drift comes from
Instrumenting
PulsedModel::newandOptimizerSession::run_all_passeswith a per-Scan checkpoint that comparesouter input factvsScan body source factshows on dpdfnet-2 (DPRNN x2):[OK](108 Scans across the model). Both outer and body still carry symbolic STREAM in their shape expressions, e.g.1 + -1*min(2, -1 + (STREAM)/160) + (STREAM)/160.Pulsifier::translate_model_with_mappings, on the very firstbefore any passcheckpoint of the post-pulse declutter loop, the Scan slots show:The drift is born inside
Pulsifier::translate_nodefor the Scan path (pulse/src/ops/scan.rs:43-51). That path clones the original Scan op, adjustsskip, and wires it as-is. The body is untouched, so its source facts still carry STREAM through their shape expressions. Meanwhile the outer wires the pulsifier just rewrote are concretized (STREAM substituted with the pulse value). Post-pulse declutter then folds the same symbolic expression to different literals in the outer scope and the body scope, and the disagreement is visible from that point onward.PR #2337's
declutter_resync_body_source_factsrule ran later and rebuilt the body to match outer. Functionally correct, but per kali, treating the symptom rather than the cause.The fix
When the Scan pulsifier wraps the original op for the non-recursive path, apply the same
symbol -> pulsesubstitution to the body and to the output_mapping that the outer pulsifier just applied to wire facts. After that, body and outer hold the same concrete expressions; post-pulse declutter folds both scopes to the same literal because there are no symbolic STREAM-dependent expressions left to fold differently.The recursive-pulse path (
pulse/src/ops/scan.rs:52-56) already bakes the substitution into the body viaPulsedModel::new(&op.body, ...). Only the clonedoutput_mappingneeded the same substitution for symmetry.Verification on dpdfnet-2
Same checkpoint, with and without the fix:
The non-scan axis
2is the correct evaluation at STREAM=320 of the symbolic expression that surrounds the chunking math. With the fix, outer and body agree on it; without the fix, outer was independently folded to1.grep -c DRIFTon the full trace:0after fix,108before fix.Pulse pipeline runs to completion. 151 chunks of 320 samples (3.00s audio) processed in 0.566s, 5.30x real-time, 3.751 ms/chunk.
Test plan
harness/core-proptest-pulse: 72/72 pass (1 ignored), unchanged vs main.harness/nnef-test-cases/scan-body-stream-drift/. Minimal NNEF graph that wires a streaming input into a Scan whose Full slot is an internally-constructed[S, 4]zero tensor. After pulsification with--pass pulse, the body source for the Full slot must be4,4(concrete) rather thanS,4(symbolic). Verified: passes with fix (exit 0), fails without fix (exit 1, grep on--pass pulse dumpcatches the residual stream symbol in the body source).