Conversation
We previously assumed that if we were optimizing a StructCmpxchg in Heap2Local, then the flow must be from the `ref` operand. This was based on an assumption that allocations are processed in order of appearance, and because the `ref` operand appears before the `expected operand`, it must be the one getting optimized. But this neglected the fact the array allocations are processed before struct allocations, so if `expected` is an array, it could be optimized first. This faulty assumption led to assertion failures and invalid code. Fix the problem by handling flows from `expected` explicitly. When a non-escaping allocation flows into `expected`, we know it cannot possibly match the value already in the accessed struct, so we can optimize the `cmpxchg` to an atomic `struct.get`. WIP because this is not quite correct due to stale LocalGraph information. When `ref` is subsequently optimized, the LocalGraph does not know about the scratch local it is written to, so the `struct.atomic.get` it now flows into is not properly optimized out.
test/lit/passes/heap2local-rmw.wast
Outdated
| ;; CHECK-NEXT: (struct.atomic.get $struct 0 | ||
| ;; CHECK-NEXT: (local.get $1) | ||
| ;; CHECK-NEXT: ) |
There was a problem hiding this comment.
@kripken, this is not being properly optimized out because of stale LocalGraph information. Do you have a suggestion about what to do? Can we just add information about the scratch local to the LocalGraph?
There was a problem hiding this comment.
It turns out the LazyLocalGraph handles this fine, but the parents map needed updating. This is now done.
There was a problem hiding this comment.
It turns out LazyLocalGraph does not handle this fine. Reduced test case and investigation from gemini:
I've successfully identified, reproduced, and fully reduced the bug found by the Binaryen
fuzzer.The issue was an out-of-bounds access in wasm::LocalGraphFlower when handling functions
where new locals were added after the initial graph analysis. This occurred during the
Heap2Local pass, which optimizes heap allocations into locals and can add new scratch locals
in the process.Reduced Test Case (WAT)
1 (module 2 (rec 3 (type $0 (shared (array i8))) 4 (type $1 (sub (struct (field (mut (ref (shared array))))))) 5 (type $2 (sub (shared (array (mut i8))))) 6 (type $3 (func (param i32))) 7 ) 8 (func $0 (type $3) (param $0 i32) 9 (local $1 (ref $2)) 10 (local $2 (ref $1)) 11 (local.set $1 12 (array.new $2 13 (i32.const 0) 14 (i32.const 0) 15 ) 16 ) 17 (drop 18 (struct.atomic.rmw.cmpxchg $1 0 19 (local.tee $2 20 (struct.new $1 21 (array.new_fixed $0 0) 22 ) 23 ) 24 (array.new_fixed $0 0) 25 (array.new_default $2 26 (i32.const 0) 27 ) 28 ) 29 ) 30 (unreachable) 31 ) 32)Root Cause
The Heap2Local pass iterates over allocations and optimizes them one by one. For each
allocation, it uses an EscapeAnalyzer which relies on a LazyLocalGraph. If a previous
optimization in the same pass added new locals to the function, the cached internal
structures of LazyLocalGraph (specifically getsByIndex and setsByIndex in LocalGraphFlower)
became stale and were not sized to include the new locals. When EscapeAnalyzer then
encountered an access to one of these new locals, it triggered an assertion failure (or
segmentation fault in Release builds) due to an out-of-bounds vector access.Fix
I implemented a minimal and robust fix in src/ir/LocalGraph.cpp by adding bounds checks when
accessing getsByIndex and setsByIndex. This ensures that any locals added after the flower
was built are safely treated as having no influences, preventing the crash while maintaining
compatibility with the existing pass structure.I have verified that this fix allows the original failing seed (4231141684229755435) to pass
successfully and does not introduce regressions in existing Heap2Local lit tests.
Obviously this fix won't work for the scratch locals, though. I'll have to go back to tracking their gets and sets separately on the side.
|
@kripken, PTAL at the latest commit, which avoids looking up scratch locals in the LazyLocalGraph. |
|
Looks reasonable, but then all the other scratch locals need to be added to that data structure, I think? |
|
Possibly, if we want to be totally consistent at the risk of doing extra work. WDYT about doing so only as the fuzzer finds problems? |
|
Why is there a risk? I mean, why would it not be needed in those places? |
|
For example we use scratch locals when optimizing allocations flowing through the |
|
I see, thanks. Well, waiting on fuzzer or other issues sounds reasonable then. But please add a comment explaining this on the data structure - that it will not contain all scratch locals, only ones we actually need to know about. |
We previously assumed that if we were optimizing a StructCmpxchg in Heap2Local, then the flow must be from the
refoperand. This was based on an assumption that allocations are processed in order of appearance, and because therefoperand appears before theexpected operand, it must be the one getting optimized. But this neglected the fact the array allocations are processed before struct allocations, so ifexpectedis an array, it could be optimized first. This faulty assumption led to assertion failures and invalid code.Fix the problem by handling flows from
expectedexplicitly. When a non-escaping allocation flows intoexpected, we know it cannot possibly match the value already in the accessed struct, so we can optimize thecmpxchgto astruct.atomic.get.The replacement pattern uses a scratch local to propagate value of the
refoperand past the droppedexpectedexpression to the newstruct.atomic.get. In case therefoperand uses another allocation that will be processed later, we must update the parents map to account for the new data flow from therefinto the scratch local and from the get of the scratch local through to thestruct.atomic.get, which fully consumes the value.