Conversation
Previously [link_and_optimize] always had [Binaryen.link] write to a temp file, optionally had [Binaryen.dead_code_elimination] write to a second temp file, and then had [Binaryen.optimize] (or, at --opt 1, a file copy) produce the final output. That means at --opt 1 we were reading the DCE'd wasm back into memory and writing it out as a copy just to land it at the right path. Restructure [link_and_optimize] into three helpers — [link], [dce], [optimize] — plus a [with_step_output] combinator that yields either the final [output_file] (when the pass is the last one in the pipeline) or a fresh pair of temp files. The last pass in the chain now always writes straight to [output_file] and its sourcemap to [opt_sourcemap_file], so no copy is needed and there are no intermediate passes through memory. The four profiles work as before: - dynlink + O1: link ->output_file - dynlink + O2/O3: link -> temp; optimize ->output_file - !dynlink + O1: link -> temp; dce ->output_file - !dynlink + O2/O3: link -> temp; dce -> temp'; optimize ->output_file
Adds a Wasm-backend linear-scan variable coalescing pass, mirroring Js_variable_coalescing. When --opt 1 skips Binaryen, we need our own pass to reuse locals; at --opt 2/3 the new pass is off and Binaryen still handles it.
Adds a new Wasm-backend pass [Local_sink] that rewrites
local.set x e
...
local.get x
into
...
local.tee x e
when the sink is sound: we must not cross another write to [x], and
moving [e] past intermediate code must not reorder observable
effects.
Because [Local_sink] always introduces [local.tee] (never inlines
[e] itself), some of the tees it produces are immediately dead:
the variable is never read again. Extend [Var_coalescing] to
detect such dead tees using the liveness it already computes — a
[LocalTee x _] Def node whose successors do not list [x] in their
[live_in] is a dead store. Replace it with its inner expression,
and drop the local from the function's [locals] list when its
last reference was the tee we just erased.
The forward walker in [Local_sink] is O(N) per candidate, and there can be O(N) candidates per function, making the pass O(N²) on adversarial inputs. Each crossed sub-expression and instruction also recomputes [effect_free], [trivially_pure], and [reads_of_expr] from scratch, so the constant is not small. Add a per-candidate walk budget ([max_walk_distance], 64) and a [tick] helper that is called at every forward step in evaluation order — instruction-to-instruction in [try_sink_in_list], and sibling-to-sibling in [wrap_binary], [wrap_list_intermediate], and the [StructSet]/[ArraySet] arms of [try_sink_in_instr]. Descending into a unary sub-expression is free. When the budget is exhausted the walker bails just as it does for any other cross failure. The bound caps worst-case per-function cost at O(N · K) with K=64. Profitable sinks are almost all within a handful of steps, so the clip is on the long tail rather than the common case.
Wasm encodes local indices as LEB128 u32 (1 byte for < 128, 2 above). With [wasm-opt] skipped at --opt 1, hot locals routinely cross the 128 boundary and pay an extra byte per access. Add a [Reorder_locals] pass that sorts non-parameter locals into a numeric block followed by a reference block, with same-type runs ordered by descending total use count and individual locals inside a run by descending per-local count. Indices are derived from list position, so reordering [locals] alone suffices. Runs at the end of [post_process_function_body], after [Initialize_locals.f]; gated on [O1] and the new [wasm-reorder-locals] flag.
When two adjacent generated columns receive identical mappings, the encoder emits only the first. If that first segment is itself skipped because it matches the previous (or is at column 0 after a newline), the old [if i > 0] guard still prepended a comma even though no segment had been written on the current line. Track whether any segment was emitted on the line ([prev >= 0]) and use that instead, so the generated .map stays parseable by wasm-merge.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We skip calling wasm-opt with --opt 1. Expected benefits:
Instead, we perform a few optimizations to reduce the code size and avoid having too many variables per function:
local.setThe wasm_of_ocaml compilation time for the test suite is reduced by about 60%. On the CI,
dune build @runtest-wasmfrom 2m 19s down to 1m 40s (-28%).Speedup on ocaml and PRT: