Releases: Ascend/TransferQueue
Releases · Ascend/TransferQueue
v0.1.8
Highlight
🚀 New Features
- Cross-Job Actor Discovery: Support cross-job actor discovery via an explicit namespace, improving usability in multi-job environments. (#115)
- RDMA Support for openYuanrong: Enable openYuanrong RDMA support to leverage high-performance network transmission. (#108)
⚙️ Refactor
- Unified Bootstrap Path: Introduce a dedicated bootstrap subfolder to isolate and organize the initialization codes for different storage backends, enhancing codebase scalability. (#103)
- Zero-Copy Serialization Utilities: Provide general serialization tools for KV backends in
serial_utils.py. The newly introducedbatch_encode_intoandbatch_decode_frominterfaces support zero-copy serialization, aligning with the RDMA transmission requirements of various backends. (#107)
# Example 1: batch_encode_into
def _put_bytes_thread_worker(self, batch_keys: list[str], batch_values: list[Any]) -> list[int]:
"""Worker thread for putting batch of non-tensors to MooncakeStore."""
# TODO: switch to a pre-registered buffer from MooncakeStore once such an API is available.
region_ptrs: list[int] = []
region_sizes: list[int] = []
def alloc(sizes: list[int]) -> list[Tensor]:
nonlocal region_ptrs, region_sizes
# `batch_packed_sizes` are byte counts. With torch.uint8 (1 byte/element),
# a 1-D shape of (N,) corresponds to exactly N bytes. We use
# `allocate_empty_tensors` to get N uint8 views over a single contiguous,
# register-able region. These are plain byte buffers, not real tensors;
# consumers apply the actual dtype/shape interpretation when unpacking.
dtypes = [torch.uint8] * len(sizes)
shapes = [(s,) for s in sizes]
buffers, _, region_ptrs, region_sizes = allocate_empty_tensors(dtypes, shapes)
return buffers
buffers, batch_sizes = serial_utils.batch_encode_into(
batch_values, alloc, num_workers=MAX_SERIAL_WORKER_THREADS
)
batch_ptrs = [cast(Tensor, b).data_ptr() for b in buffers]
self._register_all_buffers(region_ptrs, region_sizes)
try:
self._batch_upsert_with_retry(batch_keys, batch_ptrs, batch_sizes)
finally:
self._unregister_all_buffers(region_ptrs)
return batch_sizes
# Example 2: batch_decode_from
def _get_bytes_thread_worker(
self, batch_keys: list[str], batch_packed_sizes: list[int], indexes: list[int]
) -> tuple[list[Any], list[int]]:
# `batch_packed_sizes` are byte counts. With torch.uint8 (1 byte/element),
# a 1-D shape of (N,) corresponds to exactly N bytes. We use
# `allocate_empty_tensors` to get N uint8 views over a single contiguous,
# register-able region. These are plain byte buffers, not real tensors;
# consumers apply the actual dtype/shape interpretation when unpacking.
batch_shapes = [(sz,) for sz in batch_packed_sizes]
batch_dtypes = [torch.uint8] * len(batch_keys)
batch_nbytes = get_nbytes(batch_dtypes, batch_shapes)
batch_buffer_tensors, batch_buffer_ptrs, region_ptrs, region_sizes = allocate_empty_tensors(
batch_dtypes, batch_shapes
)
self._register_all_buffers(region_ptrs, region_sizes)
try:
self._batch_get_into_with_retry(batch_keys, batch_buffer_ptrs, batch_nbytes)
finally:
self._unregister_all_buffers(region_ptrs)
return serial_utils.batch_decode_from(batch_buffer_tensors), indexes🐛 Fixes & Improvements
- Concurrency Conflict Fix: Resolve concurrency conflicts between data status updates and other control operations in
TransferQueueController. (#116) - Dedicated Notification Loop: Isolate
notify_data_updateZMQ I/O into a dedicated background asyncio loop to prevent ACK timeouts inStorageManager. (#117)
What's Changed
- [chore] Bump version from 0.1.7 to 0.1.8.dev0 by @0oshowero0 in #100
- [recipe] Provide Relax style recipe by @Jixixi2020 in #93
- [refactor] Register storage backend for greater scalability by 🎉@fy2462 in #103
- [doc] Adjust yuanrong backend doc by @dpj135 in #104
- [feat] Enable openYuanrong RDMA support by @KaisennHu in #108
- [chore] Optimize config descriptions for better understanding by @0oshowero0 in #109
- [refactor] Provide common serialization tools for KV backends to speed up tensor serial in nested values by 🎉@xupinjie in #107
- [refactor] Use
batch_encode_into/batch_decode_fromfor Yuanrong backend by @dpj135 in #110 - [perf] Enable multi-thread serial for non-tensor values in MooncakeStore backend by @0oshowero0 in #111
- [chore] Relax numpy version constraints by @0oshowero0 in #113
- [feat] Support cross-job actor discovery via explicit namespace by 🎉@huniu20 in #115
- [fix,refactor] Merge
update_data_statusthread/socket intorequest_handleto eliminate concurrency conflicts by @dodatboii in #116 - [fix,refactor] Isolate
notify_data_updateZMQ I/O into a dedicated background asyncio loop by @dodatboii in #117 - [chore] Update README & bump version from 0.1.8.dev0 to 0.1.8 by @0oshowero0 in #118
New Contributors
- @Jixixi2020 made their first contribution in #93
- @fy2462 made their first contribution in #103
- @xupinjie made their first contribution in #107
- @huniu20 made their first contribution in #115
Full Changelog: v0.1.7...v0.1.8
v0.1.7
Highlight
🚀 New Features
- User-Defined Data Parser: Support user-defined data parsers to dynamically materialize URLs or file paths inside
SimpleStorage. (#82)
- Observability: Provide metrics exporter and Grafana dashboard integration for comprehensive system monitoring. (#83)
⚙️ Backends
- MooncakeStore:
- Yuanrong:
🐛 Fixes & Improvements
- Default Tensor Type Update: Use jagged tensor as the default tensor type to prevent user-side judgement errors in certain corner cases. (#92)
- Fix semantic inconsistencies in
BatchMeta.union. (#95) - Allow
Nonevalues in_pack_field_valuesand fallback toNonTensorStack. (#75) - Clean up legacy warnings for the mooncake upsert API. (#91)
What's Changed
- [chore] Bump version from 0.1.6 to 0.1.7.dev0 by @0oshowero0 in #73
- [fix] Allow None values in
_pack_field_valuesand fallback toNonTensorStackby 🎉@NINGBENZHE in #75 - [fix] Imporved robustness of
yuanrong_clientwhen callingclear_partitionby @dpj135 in #76 - [chore] Update README by @0oshowero0 in #80
- [chore] Update README: TransferQueue has been adopted in Relax by @0oshowero0 in #81
- [feat] Support metastore mode for Yuanrong backend init by @KaisennHu in #74
- [feat] Support user-defined data parser for
SimpleStoragebackend by @0oshowero0 in #82 - [client, storage] refactor: unify dynamic ZMQ socket decorator between simple_backend_manager and client by @ji-huazhong in #66
- [misc] refactor: extract
get_loggerutility to reduce code duplication by @ji-huazhong in #84 - [perf] Refactor MooncakeStore backend with zero-copy upsert API by @0oshowero0 in #77
- [misc] refactor: simplify internal classes naming by @0lynnlin0 in #86
- [BREAKING][fix] Use jagged tensor as default tensor type by @0oshowero0 in #92
- [fix] Remove legacy warnings for mooncake upsert API by 🎉 @stmatengss in #91
- [feat] Add metrics exporter and dashboard for TransferQueue by 🎉@RobotGF in #83
- [chore] Update openYuanrong related expression by @0oshowero0 in #96
- [fix] Fix
BatchMeta.unionsemantics by @0oshowero0 in #95 - [feat] Add retransmission mechanism for
MooncakeStoreClientby @0oshowero0 in #94 - [fix] Increase
mooncake_masterclient_ttl to mitigate client heartbeat timeout by @0oshowero0 in #99 - [chore] Bump version from 0.1.7.dev0 to 0.1.7 by @0oshowero0 in #97
New Contributors
- @KaisennHu made their first contribution in #74
- @0lynnlin0 made their first contribution in #86
- @stmatengss made their first contribution in #91
- @RobotGF made their first contribution in #83
Full Changelog: v0.1.6...v0.1.7
v0.1.6
Highlight
Interface
- Support easy init through
tq.init(config) - Support key-value based API: https://github.com/Ascend/TransferQueue/blob/main/tutorial/basic.ipynb
- Experimentally support
StreamingDataLoader, with customizableSeqlenBalancedSampler: https://github.com/Ascend/TransferQueue/blob/main/tutorial/06_streaming_dataloader.py 🎉 @NINGBENZHE @Aurelius84
Performance
- Refactor
BatchMetastructure to reduce computational complexity: https://gitcode.com/Ascend/TransferQueue/pull/28、https://gitcode.com/Ascend/TransferQueue/pull/29 - Optimize zero-copy implementation: https://gitcode.com/Ascend/TransferQueue/pull/7、https://gitcode.com/Ascend/TransferQueue/pull/13、https://gitcode.com/Ascend/TransferQueue/pull/23
Backends
- Support
MooncakeStorebackend 🎉 @zhaohaidao - Support
Yuanrongbackend
Recipe
- Refactor single controller integration demo: https://github.com/Ascend/TransferQueue/blob/main/recipe/simple_use_case/single_controller_demo.py 🎉 @vermouth1992
CI
- Add E2E consistency tests: https://gitcode.com/Ascend/TransferQueue/pull/25, #28
What's Changed
- [StreamingDataLoader, 1/N] feat: implement RankAwareSampler by @0oshowero0 in #4
- [StreamingDataLoader, 2/N] feat: support async sampling and data pre-fetch in RankAwareSampler by @0oshowero0 in #7
- [feat] Support store custom_meta in controller for backend-specific info by @tianyi-ge in #5
- [feat] Provide fine-grained production & consumption status retrieval by @0oshowero0 in #8
- [StreamingDataLoader, 3/N] feat: implement StreamingDataSet and StreamingDataLoader by @0oshowero0 in #9
- [chore] Add sanity checks for docstring, license and DCO by @0oshowero0 in #10
- [fix] Fix UT of
YuanrongStorageClientby @Evelynn-V in #12 - [feat] Improve
TransferQueueClientsync API compatibility with async contexts by @0oshowero0 in #11 - [chore] Update the code repository link in README by @dpj135 in #14
- [chore] Optimize pyproject and CI script by @0oshowero0 in #13
- [StreamingDataLoader, 4/N] feat: Introduce sample pre-allocation for dynamic streaming by @0oshowero0 in #16
- [chore] Update README by @0oshowero0 in #17
- [fix] Add back
TQ_ZERO_COPY_SERIALIZATIONswitch by @0oshowero0 in #19 - [chore] Update README with WeChat group by @0oshowero0 in #22
- [StreamingDataLoader, 5/N] Refactor
StreamDataLoaderimplementation by @NINGBENZHE in #23 - [feat] Provide user-defined
custom_metamethods by @0oshowero0 in #21 - [refactor] Refactor yuanrong_client by @dpj135 in #18
- [feat] Support
async_reset_consumptionto reuse data by @Aurelius84 in #25 - [refactor] Simplify initialization and improve API usability by @0oshowero0 in #26
- [feat] Introduce high-level key-value (KV) interface by @0oshowero0 in #28
- [feat] Add
RayStorageto backend choices by @Evelynn-V in #27 - [fix] Fix race condition in
update_production_statusby @0oshowero0 in #34 - [feat] Support lazy init when calling TQ API by @MissFishY in #33
- [perf] Improve performance for putting jagged tensor by @0oshowero0 in #36
- [perf] Use
zmq.asyncio.Contextto acceleratenotify_data_updateprocess by @0oshowero0 in #38 - [perf] Add
zmq.proxyto accelerate request processing forSimpleStorageUnitby @0oshowero0 in #37 - [feat] Support reverse mapping from
global_indexestokeysin KV interface by @0oshowero0 in #41 - [fix] Convert
TransferQueueControllerto non-detached Ray actor to prevent resource leaks by @0oshowero0 in #43 - [fix] Support IPv6 address by @0oshowero0 in #42
- [fix,feat] Support
MooncakeStoreeasy init by @0oshowero0 in #45 - [fix,serialization] Fix
FieldMetastatus update and remove unnecessary copy and userecv_multipart(copy=False)by default by @0oshowero0 in #46 - [fix] Fix
custom_backend_metarelated issue by @dpj135 in #47 - [perf] Reduce memory peak time for putting regular tensor by @0oshowero0 in #54
- [optimize] Refactor
BatchMetato ordinary class by @0oshowero0 in #53 - [CI] Split workflows for easier maintenance by @0oshowero0 in #56
- [recipe] Refactor recipe demo to use KV interfaces by @dodatboii in #55
- [tutorial] feat: add a basic kv tutorial in jupyter notebook by @vermouth1992 in #59
- [ci] feat: add nested tensor test in kv interface by @vermouth1992 in #58
- [feat,CI] Improve KV API usability and
KVBatchMetainteractions by @0oshowero0 in #57 - [feat] Support changing master_server port for MooncakeStore by @0oshowero0 in #62
- [ci] feat: add more tests, add jupyter tutorial to ci by @vermouth1992 in #61
- [recipe] feat: Revamp single-controller demo with agentic multi-turn rollout and add CI by @vermouth1992 in #63
- [Perf] Refactor performance test for different kv store backends by @tianyi-ge in #52
- [misc] refactor: remove sys.path hacks; rely on installed package layout by @ji-huazhong in #64
- [chore] bump version to 0.1.6.dev by @ji-huazhong in #65
- [fix] Use non-detached Ray actor for
SimpleStorageUnitby @0oshowero0 in #68 - Add automatic Yuanrong startup to interface.py by @dpj135 in #60
- [feat] Add
SeqlenBalancedSamplerand enhanceStreamingDatasetsupport by @NINGBENZHE in #70 - [chore] Update README and bump version to 0.1.6 by @0oshowero0 in #67
- [chore] Update README for new performance test by @0oshowero0 in #71
- [chore] Update README by @0oshowero0 in #72
New Contributors
- @0oshowero0 made their first contribution in #4
- @tianyi-ge made their first contribution in #5
- @Evelynn-V made their first contribution in #12
- @dpj135 made their first contribution in #14
- @NINGBENZHE made their first contribution in #23
- @Aurelius84 made their first contribution in #25
- @MissFishY made their first contribution in #33
- @dodatboii made their first contribution in #55
- @vermouth1992 made their first contribution in #59
- @ji-huazhong made their first contribution in #64
Full Changelog: https://github.com/Ascend/TransferQueue/commits/v0.1.6
Special Thanks
We sincerely thank the verl community, Mooncake, Rednote AI Platform and OpenYuanrong for their tremendous support and invaluable feedback.