Add PTODSL pipe surface APIs#459
Conversation
6581133 to
9f41d97
Compare
|
|
||
| static bool isInsideSectionCube(Operation *op) { | ||
| return op->getParentOfType<pto::SectionCubeOp>() != nullptr; | ||
| for (Operation *parent = op; parent; parent = parent->getParentOp()) |
There was a problem hiding this comment.
主要是为了支持 PTODSL nested module 的 module-level pto.kernel_kind。遍历应该没有必要,我再修改一下
| FrontendPipeHandleMap handlesById; | ||
| SmallVector<Operation *> frontendInitOps; | ||
| llvm::DenseMap<int32_t, Operation *> initOpById; | ||
| llvm::DenseMap<int64_t, Operation *> initOpByKey; |
There was a problem hiding this comment.
这里 key 是 (pipe id, kernel side),不是单纯 pipe id。原因是同一个 logical pipe id 会同时出现在 cube-side init 和 vector-side init 上,lowering 查找 handle 时需要按当前 op 所在 side 匹配
9f41d97 to
90a92e9
Compare
| `push`) on the Cube side and C2V-consumer methods (`init_simd`, `pop`, `free`) | ||
| on the Vector side. | ||
|
|
||
| #### `pto.pipe.v2c_global(gm_slot_tensor, *, id, slot_size=None, nosplit=None)` |
There was a problem hiding this comment.
- mix kernel中可能使用多条pipe,push/pop 如何与数据通讯的pipe关联,Cube和Vector之间的pipe如何关联?IR上的接口是通过pipe的id以及push/pop中的id关联
- DSL上是否有必要区分global/local这些pipe的location?个人观点,只需要定义好接口参数的语义,通过kernel入口中的target参数区分需要使用哪几个参数即可。当前的设计可能存在兼容性的问题
- IR上为了A3/A5接口兼容,A3架构也需要把Consumer的地址传入pipe当中,不仅只有GM地址
- GM地址是否需要定义为gm_slot_tensor?因为其实只需要一个gm指针,gm_slot_tensor(TensorView)中的其它信息向下lowering的时候应该都用不上,有些冗余。
There was a problem hiding this comment.
-
多 pipe 关联:
现在通过 stable pipeid+ direction + kernel side 关联。lowering 里 key 已改成(pipe id, kernel side),文档也补了多 pipe 需要不同 stable id。 -
global/local API:
已去掉公开的c2v_global/c2v_local/v2c_global/v2c_local,统一成pto.pipe.c2v(...)/pto.pipe.v2c(...)。local/global-entry 由参数语义区分。 -
A3 consumer 地址:
已修改。global-entry lowering 现在同时要求gm_slot_buffer和 consumer local buffer,并传给InitializeL2G2LPipeOp。 -
gm_slot_tensor:
gm_slot_buffer是实际 GM 地址;gm_slot_tensor暂时保留为 entry descriptor,用于 type/shape/slot_size 推导和verifier 校验。
7eb3958 to
752d5ba
Compare
| BLOCK: pto.constexpr = 128, | ||
| ): | ||
| gm_view = pto.make_tensor_view(gm_slot_buffer, shape=[16, 16], strides=[16, 1]) | ||
| c2v_buf = pto.reserve_buffer("c2v_fifo", size=8192, location="vec") |
There was a problem hiding this comment.
确认下,这里是否应该是import_reserve_buffer
There was a problem hiding this comment.
这里写的不太对。 global-entry pipe 是 global-only GM FIFO,初始化时只需要gm_slot_tensor。应该不需要这些reserve_buffer和import_reserve_buffer,只需要gm_slot_tenser。
| @pto.simd | ||
| def vector_kernel(): | ||
| c2v.init_simd() | ||
| entry = c2v.pop(split=0, result_type=c2v.entry_type) |
There was a problem hiding this comment.
这里没看懂:
- result_type是什么
- pop出来的不是一个tile吗,为啥又去做view使用
There was a problem hiding this comment.
global-entry pop() 默认用 entry_type,返回的是当前 GM FIFO slot 的 TensorView descriptor;用户再从这个 descriptor 派生 sub-view,然后显式 tile.load。 示例改成 entry = c2v.pop(split=0),让它默认使用 pipe 的 entry_type。 并补充相关说明
Summary
This PR adds the reusable PTODSL pipe surface/frontend API work, based on Zhendong404 commit
a382a51064e5c3c3a160377ee778dca88c5f91bf, plus the pipe transaction support needed by the FA PTODSL frontend path.Main changes:
pto.pipehigh-level namespace.pto.pipe.c2v(...)pto.pipe.v2c(...)pto.pipe.bidirectional(...)consumer_bufgm_slot_buffergm_slot_tensorslot_sizepto.reserve_buffer(...)pto.import_reserved_buffer(...)pto.gm_ptr(...)pipe.push(...)pipe.pop(...)pipe.free(...)@pto.cube/@pto.simdsection scopes.ptodsl/docs/user_guide/07-data-movement-ops.mdwith the new pipe surface usage.Notes
The public API no longer exposes separate
*_global/*_localconstructor names. The pipe direction is part of theconstructor name, while local/global-entry behavior is selected by the provided operands.
gm_slot_bufferis the actual GM FIFO storage pointer.gm_slot_tensoris kept as the entry descriptor for type/shape/slot-size inference and verification.
This PR is intentionally separate from the FA PTODSL PR and only covers the reusable PTODSL pipe surface/frontend API
work.
Validation
Run on
dev-481211:ninja -C buildpassed.python3 ptodsl/tests/test_vector_cube_ops.py -vpassed.python3 ptodsl/tests/test_pipe_surface_sample_compile.pypassed.python3 ptodsl/tests/test_docs_as_test.pypassed.ptoas --pto-arch=a3.