Skip to content

Handle StructCmpxchg::expected in Heap2Local#8491

Merged
tlively merged 5 commits intomainfrom
heap2local
Mar 20, 2026
Merged

Handle StructCmpxchg::expected in Heap2Local#8491
tlively merged 5 commits intomainfrom
heap2local

Conversation

@tlively
Copy link
Member

@tlively tlively commented Mar 19, 2026

We previously assumed that if we were optimizing a StructCmpxchg in Heap2Local, then the flow must be from the ref operand. This was based on an assumption that allocations are processed in order of appearance, and because the ref operand appears before the expected operand, it must be the one getting optimized. But this neglected the fact the array allocations are processed before struct allocations, so if expected is an array, it could be optimized first. This faulty assumption led to assertion failures and invalid code.

Fix the problem by handling flows from expected explicitly. When a non-escaping allocation flows into expected, we know it cannot possibly match the value already in the accessed struct, so we can optimize the cmpxchg to a struct.atomic.get.

The replacement pattern uses a scratch local to propagate value of the ref operand past the dropped expected expression to the new struct.atomic.get. In case the ref operand uses another allocation that will be processed later, we must update the parents map to account for the new data flow from the ref into the scratch local and from the get of the scratch local through to the struct.atomic.get, which fully consumes the value.

We previously assumed that if we were optimizing a StructCmpxchg in Heap2Local, then the flow must be from the `ref` operand. This was based on an assumption that allocations are processed in order of appearance, and because the `ref` operand appears before the `expected operand`, it must be the one getting optimized. But this neglected the fact the array allocations are processed before struct allocations, so if `expected` is an array, it could be optimized first. This faulty assumption led to assertion failures and invalid code.

Fix the problem by handling flows from `expected` explicitly. When a non-escaping allocation flows into `expected`, we know it cannot possibly match the value already in the accessed struct, so we can optimize the `cmpxchg` to an atomic `struct.get`.

WIP because this is not quite correct due to stale LocalGraph information. When `ref` is subsequently optimized, the LocalGraph does not know about the scratch local it is written to, so the `struct.atomic.get` it now flows into is not properly optimized out.
@tlively tlively requested a review from kripken March 19, 2026 02:10
Comment on lines +988 to +990
;; CHECK-NEXT: (struct.atomic.get $struct 0
;; CHECK-NEXT: (local.get $1)
;; CHECK-NEXT: )
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken, this is not being properly optimized out because of stale LocalGraph information. Do you have a suggestion about what to do? Can we just add information about the scratch local to the LocalGraph?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out the LazyLocalGraph handles this fine, but the parents map needed updating. This is now done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out LazyLocalGraph does not handle this fine. Reduced test case and investigation from gemini:

I've successfully identified, reproduced, and fully reduced the bug found by the Binaryen
fuzzer.

The issue was an out-of-bounds access in wasm::LocalGraphFlower when handling functions
where new locals were added after the initial graph analysis. This occurred during the
Heap2Local pass, which optimizes heap allocations into locals and can add new scratch locals
in the process.

Reduced Test Case (WAT)

1 (module
2  (rec
3   (type $0 (shared (array i8)))
4   (type $1 (sub (struct (field (mut (ref (shared array)))))))
5   (type $2 (sub (shared (array (mut i8)))))
6   (type $3 (func (param i32)))
7  )
8  (func $0 (type $3) (param $0 i32)
9   (local $1 (ref $2))
10  (local $2 (ref $1))
11  (local.set $1
12   (array.new $2
13    (i32.const 0)
14    (i32.const 0)
15   )
16  )
17  (drop
18   (struct.atomic.rmw.cmpxchg $1 0
19    (local.tee $2
20     (struct.new $1
21      (array.new_fixed $0 0)
22     )
23    )
24    (array.new_fixed $0 0)
25    (array.new_default $2
26     (i32.const 0)
27    )
28   )
29  )
30  (unreachable)
31 )
32)

Root Cause
The Heap2Local pass iterates over allocations and optimizes them one by one. For each
allocation, it uses an EscapeAnalyzer which relies on a LazyLocalGraph. If a previous
optimization in the same pass added new locals to the function, the cached internal
structures of LazyLocalGraph (specifically getsByIndex and setsByIndex in LocalGraphFlower)
became stale and were not sized to include the new locals. When EscapeAnalyzer then
encountered an access to one of these new locals, it triggered an assertion failure (or
segmentation fault in Release builds) due to an out-of-bounds vector access.

Fix
I implemented a minimal and robust fix in src/ir/LocalGraph.cpp by adding bounds checks when
accessing getsByIndex and setsByIndex. This ensures that any locals added after the flower
was built are safely treated as having no influences, preventing the crash while maintaining
compatibility with the existing pass structure.

I have verified that this fix allows the original failing seed (4231141684229755435) to pass
successfully and does not introduce regressions in existing Heap2Local lit tests.

Obviously this fix won't work for the scratch locals, though. I'll have to go back to tracking their gets and sets separately on the side.

@tlively tlively changed the title [WIP] Handle StructCmpxchg::expected in Heap2Local Handle StructCmpxchg::expected in Heap2Local Mar 19, 2026
@tlively tlively requested a review from a team as a code owner March 19, 2026 21:36
@tlively tlively requested review from aheejin and removed request for a team March 19, 2026 21:36
@tlively
Copy link
Member Author

tlively commented Mar 19, 2026

@kripken, PTAL at the latest commit, which avoids looking up scratch locals in the LazyLocalGraph.

@kripken
Copy link
Member

kripken commented Mar 19, 2026

Looks reasonable, but then all the other scratch locals need to be added to that data structure, I think?

@tlively
Copy link
Member Author

tlively commented Mar 19, 2026

Possibly, if we want to be totally consistent at the risk of doing extra work. WDYT about doing so only as the fuzzer finds problems?

@kripken
Copy link
Member

kripken commented Mar 19, 2026

Why is there a risk? I mean, why would it not be needed in those places?

@tlively
Copy link
Member Author

tlively commented Mar 19, 2026

For example we use scratch locals when optimizing allocations flowing through the ref field of struct.rmw.xchg (not cmpxchg now). But the allocation flowing into the replacement field there escapes, so after we optimize the ref field, we will never optimize anything flowing through what was the struct.rmw.xchg again, so we will never look at the scratch locals. And the other RMW instructions don't even accept references in the replacement field. I'm pretty sure something similar happens to also make this unnecessary for ref.cast_desc_eq, which is the other place where we use scratch locals.

@kripken
Copy link
Member

kripken commented Mar 19, 2026

I see, thanks. Well, waiting on fuzzer or other issues sounds reasonable then. But please add a comment explaining this on the data structure - that it will not contain all scratch locals, only ones we actually need to know about.

@tlively tlively enabled auto-merge (squash) March 20, 2026 01:01
@tlively tlively merged commit 5646ea6 into main Mar 20, 2026
16 checks passed
@tlively tlively deleted the heap2local branch March 20, 2026 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants