Allow typecast fusion#602
Conversation
2e47eeb to
4fdbe5e
Compare
5eb5955 to
582baf0
Compare
|
Resolve conflicts. |
zoecarver
left a comment
There was a problem hiding this comment.
Very cool! LGTM. Thank you, Vlad!
| @@ -557,7 +589,7 @@ static LogicalResult buildFusedCompute(Operation *sinkOp, | |||
| Value inputTile = tensorToTile[bcastOp.getInput()]; | |||
| Value outputTile = body->getArguments().back(); // output block arg | |||
| auto bcastTileOp = createTileOpWithPlaceholderDstIndex<TileBcastOp>( | |||
There was a problem hiding this comment.
This still uses outputTileType, which is the final fused sink type, not necessarily this particular op's result type. That breaks cases like typecast(bcast(...)), where the bcast result should keep its own dtype and the later ttl.tile_typecast should perform the conversion.
The same applies to the matmul special cases below. These special-case emitters should derive the result tile type from the current source op, like emitTileOpFor() does.
| } | ||
| } | ||
| for (int64_t cb : conflicts) { | ||
| fpuCBs[cb]->emitWarning() |
There was a problem hiding this comment.
Should this be a pass error instead of easy to miss warning (since could lead to losing f32 precision)?
|
Can you maybe add a couple fused testcases like |
Problem description
ttl.typecastcould not participate in fusedttl.computeregions, which forced dtype-changing expressions to break fusion or fail during TTL-to-compute lowering. This also made mixed dtype fusion behavior unclear, especially when f32 inputs requiredunpack_to_dest_fp32handling.What's changed
This change enables
ttl.typecastfusion by deriving tile result types from each source op's tensor result instead of relying on a single default tile type. It updates compute lowering, Python AST/API type handling, and kernel config assignment so fused compute regions can preserve dtype-changing intermediates correctly.The compute kernel config logic now tracks
unpack_to_dest_fp32per CB and detects conflicts where the same f32 CB is consumed by both FPU and SFPU strategies. Tests were added and updated to cover typecast fusion, mixed dtype rejection, andunpack_to_dest_fp32positive, negative, and conflict cases.Ticket
#264
Checklist