Tracking issue for small items found while refactoring the transfer IR (SELF_COPY/SHADOW → transport ∈ {P2P, BROADCAST, LOCAL, NONE} + orthogonal is_source/is_target). They were intentionally left out of that PR to keep it scoped to the role/transport split and the self-copy bucket-finalize fix. Collecting them here to do in one pass.
1. execute() should own self.work
Transferable.execute() returns the dist.Work handle, and the only two callers immediately assign it back:
src/etha/comm/execution.py:90 — chunk.work = chunk.execute()
src/etha/comm/ir.py:218 — self.work = self.execute() (in Bucket.launch)
Move the assignment into execute() so it sets self.work directly; call sites become chunk.execute() / self.execute(). Trade-off: execute() mutates self instead of being side-effect-free — but since neither caller uses the return value for anything else, the "pure" property buys nothing. Net simplification.
2. Overlapping source/target meshes (a rank in both) are not correctly handled
Latent — production uses disjoint meshes, so none of these fire. The current single-tensor local (self-copy) chunk encodes "read and write the same tensor", which only holds when the source and target tensors coincide (as in the unit test).
- (a) Wrong-tensor read. The local chunk is built with
tensor=target_tensor but reads src_slice (a source-grid slice) — src/etha/comm/get_chunks.py (the Transport.LOCAL branch, ~L100-110). When a rank holds distinct source and target tensors, this reads target_tensor[src_slice] instead of source_tensor[src_slice]. Proper fix: split the single tensor field into read_tensor/write_tensor (then is_source reads read_tensor[src_slice], is_target writes write_tensor[dst_slice], self-copy uses both).
- (b) Skipped Partial reduce. The local chunk doesn't carry
source_partial_groups, so a rank that is both a Partial source and a target won't reduce before its local write.
- (c) P2P-to-self hang. A route whose sole dst is the source rank emits a P2P
is_source chunk that isends to itself with no matching irecv (self-delivery goes through the LOCAL chunk).
All three disappear if overlapping meshes are either (i) properly supported — read/write tensor split + the self-cell routed only through LOCAL, never P2P, with the reduce applied — or (ii) explicitly rejected at map-build time. Decision needed: support vs reject.
Regression anchor: tests/test_self_copy_bucket.py covers the single-tensor LOCAL path. A distinct-tensor test would expose (a).
More to be appended as review continues.
Tracking issue for small items found while refactoring the transfer IR (
SELF_COPY/SHADOW→transport ∈ {P2P, BROADCAST, LOCAL, NONE}+ orthogonalis_source/is_target). They were intentionally left out of that PR to keep it scoped to the role/transport split and the self-copy bucket-finalize fix. Collecting them here to do in one pass.1.
execute()should ownself.workTransferable.execute()returns thedist.Workhandle, and the only two callers immediately assign it back:src/etha/comm/execution.py:90—chunk.work = chunk.execute()src/etha/comm/ir.py:218—self.work = self.execute()(inBucket.launch)Move the assignment into
execute()so it setsself.workdirectly; call sites becomechunk.execute()/self.execute(). Trade-off:execute()mutatesselfinstead of being side-effect-free — but since neither caller uses the return value for anything else, the "pure" property buys nothing. Net simplification.2. Overlapping source/target meshes (a rank in both) are not correctly handled
Latent — production uses disjoint meshes, so none of these fire. The current single-
tensorlocal (self-copy) chunk encodes "read and write the same tensor", which only holds when the source and target tensors coincide (as in the unit test).tensor=target_tensorbut readssrc_slice(a source-grid slice) —src/etha/comm/get_chunks.py(theTransport.LOCALbranch, ~L100-110). When a rank holds distinct source and target tensors, this readstarget_tensor[src_slice]instead ofsource_tensor[src_slice]. Proper fix: split the singletensorfield intoread_tensor/write_tensor(thenis_sourcereadsread_tensor[src_slice],is_targetwriteswrite_tensor[dst_slice], self-copy uses both).source_partial_groups, so a rank that is both a Partial source and a target won't reduce before its local write.is_sourcechunk thatisends to itself with no matchingirecv(self-delivery goes through the LOCAL chunk).All three disappear if overlapping meshes are either (i) properly supported — read/write tensor split + the self-cell routed only through
LOCAL, never P2P, with the reduce applied — or (ii) explicitly rejected at map-build time. Decision needed: support vs reject.Regression anchor:
tests/test_self_copy_bucket.pycovers the single-tensor LOCAL path. A distinct-tensor test would expose (a).More to be appended as review continues.