Releases · Ascend/TransferQueue

08 Jun 02:30

0oshowero0

v0.1.8

35bcf19

v0.1.8 Latest

Latest

Highlight

🚀 New Features

Cross-Job Actor Discovery: Support cross-job actor discovery via an explicit namespace, improving usability in multi-job environments. (#115)
RDMA Support for openYuanrong: Enable openYuanrong RDMA support to leverage high-performance network transmission. (#108)

⚙️ Refactor

Unified Bootstrap Path: Introduce a dedicated bootstrap subfolder to isolate and organize the initialization codes for different storage backends, enhancing codebase scalability. (#103)
Zero-Copy Serialization Utilities: Provide general serialization tools for KV backends in serial_utils.py. The newly introduced batch_encode_into and batch_decode_from interfaces support zero-copy serialization, aligning with the RDMA transmission requirements of various backends. (#107)

# Example 1: batch_encode_into
def _put_bytes_thread_worker(self, batch_keys: list[str], batch_values: list[Any]) -> list[int]:
    """Worker thread for putting batch of non-tensors to MooncakeStore."""

    # TODO: switch to a pre-registered buffer from MooncakeStore once such an API is available.
    region_ptrs: list[int] = []
    region_sizes: list[int] = []

    def alloc(sizes: list[int]) -> list[Tensor]:
        nonlocal region_ptrs, region_sizes
        # `batch_packed_sizes` are byte counts. With torch.uint8 (1 byte/element),
        # a 1-D shape of (N,) corresponds to exactly N bytes. We use
        # `allocate_empty_tensors` to get N uint8 views over a single contiguous,
        # register-able region. These are plain byte buffers, not real tensors;
        # consumers apply the actual dtype/shape interpretation when unpacking.
        dtypes = [torch.uint8] * len(sizes)
        shapes = [(s,) for s in sizes]
        buffers, _, region_ptrs, region_sizes = allocate_empty_tensors(dtypes, shapes)
        return buffers

    buffers, batch_sizes = serial_utils.batch_encode_into(
        batch_values, alloc, num_workers=MAX_SERIAL_WORKER_THREADS
    )
    batch_ptrs = [cast(Tensor, b).data_ptr() for b in buffers]

    self._register_all_buffers(region_ptrs, region_sizes)
    try:
        self._batch_upsert_with_retry(batch_keys, batch_ptrs, batch_sizes)
    finally:
        self._unregister_all_buffers(region_ptrs)

    return batch_sizes

# Example 2: batch_decode_from
def _get_bytes_thread_worker(
    self, batch_keys: list[str], batch_packed_sizes: list[int], indexes: list[int]
) -> tuple[list[Any], list[int]]:
    # `batch_packed_sizes` are byte counts. With torch.uint8 (1 byte/element),
    # a 1-D shape of (N,) corresponds to exactly N bytes. We use
    # `allocate_empty_tensors` to get N uint8 views over a single contiguous,
    # register-able region. These are plain byte buffers, not real tensors;
    # consumers apply the actual dtype/shape interpretation when unpacking.
    batch_shapes = [(sz,) for sz in batch_packed_sizes]
    batch_dtypes = [torch.uint8] * len(batch_keys)
    batch_nbytes = get_nbytes(batch_dtypes, batch_shapes)
    batch_buffer_tensors, batch_buffer_ptrs, region_ptrs, region_sizes = allocate_empty_tensors(
        batch_dtypes, batch_shapes
    )

    self._register_all_buffers(region_ptrs, region_sizes)
    try:
        self._batch_get_into_with_retry(batch_keys, batch_buffer_ptrs, batch_nbytes)
    finally:
        self._unregister_all_buffers(region_ptrs)

    return serial_utils.batch_decode_from(batch_buffer_tensors), indexes

🐛 Fixes & Improvements

Concurrency Conflict Fix: Resolve concurrency conflicts between data status updates and other control operations in TransferQueueController. (#116)
Dedicated Notification Loop: Isolate notify_data_update ZMQ I/O into a dedicated background asyncio loop to prevent ACK timeouts in StorageManager. (#117)

What's Changed

[chore] Bump version from 0.1.7 to 0.1.8.dev0 by @0oshowero0 in #100
[recipe] Provide Relax style recipe by @Jixixi2020 in #93
[refactor] Register storage backend for greater scalability by 🎉@fy2462 in #103
[doc] Adjust yuanrong backend doc by @dpj135 in #104
[feat] Enable openYuanrong RDMA support by @KaisennHu in #108
[chore] Optimize config descriptions for better understanding by @0oshowero0 in #109
[refactor] Provide common serialization tools for KV backends to speed up tensor serial in nested values by 🎉@xupinjie in #107
[refactor] Use batch_encode_into/batch_decode_from for Yuanrong backend by @dpj135 in #110
[perf] Enable multi-thread serial for non-tensor values in MooncakeStore backend by @0oshowero0 in #111
[chore] Relax numpy version constraints by @0oshowero0 in #113
[feat] Support cross-job actor discovery via explicit namespace by 🎉@huniu20 in #115
[fix,refactor] Merge update_data_status thread/socket into request_handle to eliminate concurrency conflicts by @dodatboii in #116
[fix,refactor] Isolate notify_data_update ZMQ I/O into a dedicated background asyncio loop by @dodatboii in #117
[chore] Update README & bump version from 0.1.8.dev0 to 0.1.8 by @0oshowero0 in #118

New Contributors

@Jixixi2020 made their first contribution in #93
@fy2462 made their first contribution in #103
@xupinjie made their first contribution in #107
@huniu20 made their first contribution in #115

Full Changelog: v0.1.7...v0.1.8

Contributors

fy2462, 0oshowero0, and 6 other contributors

Assets 3

14 May 14:23

0oshowero0

v0.1.7

01572d2

v0.1.7

Highlight

🚀 New Features

User-Defined Data Parser: Support user-defined data parsers to dynamically materialize URLs or file paths inside SimpleStorage. (#82)

Observability: Provide metrics exporter and Grafana dashboard integration for comprehensive system monitoring. (#83)

⚙️ Backends

MooncakeStore:
- Refactored backend to support zero-copy API, hard-pin, and upsert for better performance. (#77)
- Added retransmission mechanism for robust data transport. (#94)
- Increased client_ttl to mitigate heartbeat timeouts. (#99)
Yuanrong:
- Support metastore mode for backend initialization. (#74)
- Imporved robustness of yuanrong_client when calling clear_partition (#76)

🐛 Fixes & Improvements

Default Tensor Type Update: Use jagged tensor as the default tensor type to prevent user-side judgement errors in certain corner cases. (#92)
Fix semantic inconsistencies in BatchMeta.union. (#95)
Allow None values in _pack_field_values and fallback to NonTensorStack. (#75)
Clean up legacy warnings for the mooncake upsert API. (#91)

What's Changed

[chore] Bump version from 0.1.6 to 0.1.7.dev0 by @0oshowero0 in #73
[fix] Allow None values in _pack_field_values and fallback to NonTensorStack by 🎉@NINGBENZHE in #75
[fix] Imporved robustness of yuanrong_client when calling clear_partition by @dpj135 in #76
[chore] Update README by @0oshowero0 in #80
[chore] Update README: TransferQueue has been adopted in Relax by @0oshowero0 in #81
[feat] Support metastore mode for Yuanrong backend init by @KaisennHu in #74
[feat] Support user-defined data parser for SimpleStorage backend by @0oshowero0 in #82
[client, storage] refactor: unify dynamic ZMQ socket decorator between simple_backend_manager and client by @ji-huazhong in #66
[misc] refactor: extract get_logger utility to reduce code duplication by @ji-huazhong in #84
[perf] Refactor MooncakeStore backend with zero-copy upsert API by @0oshowero0 in #77
[misc] refactor: simplify internal classes naming by @0lynnlin0 in #86
[BREAKING][fix] Use jagged tensor as default tensor type by @0oshowero0 in #92
[fix] Remove legacy warnings for mooncake upsert API by 🎉 @stmatengss in #91
[feat] Add metrics exporter and dashboard for TransferQueue by 🎉@RobotGF in #83
[chore] Update openYuanrong related expression by @0oshowero0 in #96
[fix] Fix BatchMeta.union semantics by @0oshowero0 in #95
[feat] Add retransmission mechanism for MooncakeStoreClient by @0oshowero0 in #94
[fix] Increase mooncake_master client_ttl to mitigate client heartbeat timeout by @0oshowero0 in #99
[chore] Bump version from 0.1.7.dev0 to 0.1.7 by @0oshowero0 in #97

New Contributors

@KaisennHu made their first contribution in #74
@0lynnlin0 made their first contribution in #86
@stmatengss made their first contribution in #91
@RobotGF made their first contribution in #83

Full Changelog: v0.1.6...v0.1.7

Contributors

stmatengss, RobotGF, and 6 other contributors

Assets 3

07 Apr 01:25

0oshowero0

v0.1.6

e04cc05

v0.1.6

Highlight

Interface

Support easy init through tq.init(config)
Support key-value based API: https://github.com/Ascend/TransferQueue/blob/main/tutorial/basic.ipynb
Experimentally support StreamingDataLoader, with customizable SeqlenBalancedSampler: https://github.com/Ascend/TransferQueue/blob/main/tutorial/06_streaming_dataloader.py 🎉 @NINGBENZHE @Aurelius84

Performance

Refactor BatchMeta structure to reduce computational complexity: https://gitcode.com/Ascend/TransferQueue/pull/28、https://gitcode.com/Ascend/TransferQueue/pull/29
Optimize zero-copy implementation: https://gitcode.com/Ascend/TransferQueue/pull/7、https://gitcode.com/Ascend/TransferQueue/pull/13、https://gitcode.com/Ascend/TransferQueue/pull/23

Backends

Support MooncakeStore backend 🎉 @zhaohaidao
Support Yuanrong backend

Recipe

Refactor single controller integration demo: https://github.com/Ascend/TransferQueue/blob/main/recipe/simple_use_case/single_controller_demo.py 🎉 @vermouth1992

CI

Add E2E consistency tests: https://gitcode.com/Ascend/TransferQueue/pull/25, #28

What's Changed

[StreamingDataLoader, 1/N] feat: implement RankAwareSampler by @0oshowero0 in #4
[StreamingDataLoader, 2/N] feat: support async sampling and data pre-fetch in RankAwareSampler by @0oshowero0 in #7
[feat] Support store custom_meta in controller for backend-specific info by @tianyi-ge in #5
[feat] Provide fine-grained production & consumption status retrieval by @0oshowero0 in #8
[StreamingDataLoader, 3/N] feat: implement StreamingDataSet and StreamingDataLoader by @0oshowero0 in #9
[chore] Add sanity checks for docstring, license and DCO by @0oshowero0 in #10
[fix] Fix UT of YuanrongStorageClient by @Evelynn-V in #12
[feat] Improve TransferQueueClient sync API compatibility with async contexts by @0oshowero0 in #11
[chore] Update the code repository link in README by @dpj135 in #14
[chore] Optimize pyproject and CI script by @0oshowero0 in #13
[StreamingDataLoader, 4/N] feat: Introduce sample pre-allocation for dynamic streaming by @0oshowero0 in #16
[chore] Update README by @0oshowero0 in #17
[fix] Add back TQ_ZERO_COPY_SERIALIZATION switch by @0oshowero0 in #19
[chore] Update README with WeChat group by @0oshowero0 in #22
[StreamingDataLoader, 5/N] Refactor StreamDataLoader implementation by @NINGBENZHE in #23
[feat] Provide user-defined custom_meta methods by @0oshowero0 in #21
[refactor] Refactor yuanrong_client by @dpj135 in #18
[feat] Support async_reset_consumption to reuse data by @Aurelius84 in #25
[refactor] Simplify initialization and improve API usability by @0oshowero0 in #26
[feat] Introduce high-level key-value (KV) interface by @0oshowero0 in #28
[feat] Add RayStorage to backend choices by @Evelynn-V in #27
[fix] Fix race condition in update_production_status by @0oshowero0 in #34
[feat] Support lazy init when calling TQ API by @MissFishY in #33
[perf] Improve performance for putting jagged tensor by @0oshowero0 in #36
[perf] Use zmq.asyncio.Context to accelerate notify_data_update process by @0oshowero0 in #38
[perf] Add zmq.proxy to accelerate request processing for SimpleStorageUnit by @0oshowero0 in #37
[feat] Support reverse mapping from global_indexes to keys in KV interface by @0oshowero0 in #41
[fix] Convert TransferQueueController to non-detached Ray actor to prevent resource leaks by @0oshowero0 in #43
[fix] Support IPv6 address by @0oshowero0 in #42
[fix,feat] Support MooncakeStore easy init by @0oshowero0 in #45
[fix,serialization] Fix FieldMeta status update and remove unnecessary copy and use recv_multipart(copy=False) by default by @0oshowero0 in #46
[fix] Fix custom_backend_meta related issue by @dpj135 in #47
[perf] Reduce memory peak time for putting regular tensor by @0oshowero0 in #54
[optimize] Refactor BatchMeta to ordinary class by @0oshowero0 in #53
[CI] Split workflows for easier maintenance by @0oshowero0 in #56
[recipe] Refactor recipe demo to use KV interfaces by @dodatboii in #55
[tutorial] feat: add a basic kv tutorial in jupyter notebook by @vermouth1992 in #59
[ci] feat: add nested tensor test in kv interface by @vermouth1992 in #58
[feat,CI] Improve KV API usability and KVBatchMeta interactions by @0oshowero0 in #57
[feat] Support changing master_server port for MooncakeStore by @0oshowero0 in #62
[ci] feat: add more tests, add jupyter tutorial to ci by @vermouth1992 in #61
[recipe] feat: Revamp single-controller demo with agentic multi-turn rollout and add CI by @vermouth1992 in #63
[Perf] Refactor performance test for different kv store backends by @tianyi-ge in #52
[misc] refactor: remove sys.path hacks; rely on installed package layout by @ji-huazhong in #64
[chore] bump version to 0.1.6.dev by @ji-huazhong in #65
[fix] Use non-detached Ray actor for SimpleStorageUnit by @0oshowero0 in #68
Add automatic Yuanrong startup to interface.py by @dpj135 in #60
[feat] Add SeqlenBalancedSampler and enhance StreamingDataset support by @NINGBENZHE in #70
[chore] Update README and bump version to 0.1.6 by @0oshowero0 in #67
[chore] Update README for new performance test by @0oshowero0 in #71
[chore] Update README by @0oshowero0 in #72

New Contributors

@0oshowero0 made their first contribution in #4
@tianyi-ge made their first contribution in #5
@Evelynn-V made their first contribution in #12
@dpj135 made their first contribution in #14
@NINGBENZHE made their first contribution in #23
@Aurelius84 made their first contribution in #25
@MissFishY made their first contribution in #33
@dodatboii made their first contribution in #55
@vermouth1992 made their first contribution in #59
@ji-huazhong made their first contribution in #64

Full Changelog: https://github.com/Ascend/TransferQueue/commits/v0.1.6

Special Thanks

We sincerely thank the verl community, Mooncake, Rednote AI Platform and OpenYuanrong for their tremendous support and invaluable feedback.

Contributors

zhaohaidao, Aurelius84, and 9 other contributors

Assets 3

Releases: Ascend/TransferQueue

v0.1.8

Highlight

🚀 New Features

⚙️ Refactor

🐛 Fixes & Improvements

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.7

Highlight

🚀 New Features

⚙️ Backends

🐛 Fixes & Improvements

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.6

Highlight

Interface

Performance

Backends

Recipe

CI

What's Changed

New Contributors

Special Thanks

Contributors

Uh oh!