[feat] Support cross-job actor discovery via explicit namespace#115
Conversation
CLA Signature Guide@huniu20 , thanks for your pull request. The following commit(s) are not associated with a signed Contributor License Agreement (CLA).
To sign CLA, click here. To check if your email is configured correctly, refer to the FAQs. Once you've signed the CLA or updating your email, please comment |
|
Thanks for your contribution. Kindly resolve failing CI tests first. @huniu20 |
5a1900d to
d02f41b
Compare
CLA Signature Guide@huniu20 , thanks for your pull request. The following commit(s) are not associated with a signed Contributor License Agreement (CLA).
To sign CLA, click here. To check if your email is configured correctly, refer to the FAQs. Once you've signed the CLA or updating your email, please comment |
|
/check-cla |
CLA Signature Passhuniu20, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
d02f41b to
9af6f55
Compare
CLA Signature Passhuniu20, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
There was a problem hiding this comment.
Pull request overview
This PR makes TransferQueue’s Ray named actors discoverable across multiple Ray Jobs sharing the same cluster by consistently using a fixed Ray namespace ("transfer_queue") when creating and retrieving the controller (and the Ray storage actor).
Changes:
- Add
namespace="transfer_queue"toray.get_actor("TransferQueueController", ...)call sites (library + tests + examples). - Create
TransferQueueController(andRayObjectRefStorage) explicitly in the"transfer_queue"namespace via.options(namespace=...). - Update E2E tests and tutorial/demo code to retrieve the controller from the fixed namespace.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tutorial/06_streaming_dataloader.py | Fetches the controller from the fixed Ray namespace in the tutorial worker path. |
| transfer_queue/storage/clients/ray_storage_client.py | Fetches/creates RayObjectRefStorage in the fixed Ray namespace to enable cross-job access. |
| transfer_queue/interface.py | Fetches/creates TransferQueueController in the fixed Ray namespace for cross-job discovery. |
| tests/e2e/test_kv_interface_e2e.py | Updates test fixture to retrieve controller from the fixed namespace. |
| recipe/simple_use_case/relax_demo.py | Updates demo worker to retrieve controller from the fixed namespace. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| if _TQ_CONTROLLER is None: | ||
| _TQ_CONTROLLER = ray.get_actor("TransferQueueController") | ||
| _TQ_CONTROLLER = ray.get_actor("TransferQueueController", namespace="transfer_queue") |
| # initialize actor | ||
| try: | ||
| self.storage_actor = ray.get_actor("RayObjectRefStorage") | ||
| self.storage_actor = ray.get_actor("RayObjectRefStorage", namespace="transfer_queue") |
When multiple Ray Jobs share the same Ray cluster, Named Actors are isolated by namespace. Without an explicit namespace, a TQ Controller created by one job is invisible to workers in another job. This commit adds namespace="transfer_queue" to both: - ray.get_actor() in _init_from_existing() - TransferQueueController.options() in init() This ensures that the TQ Controller is always registered and discovered in the fixed "transfer_queue" namespace, enabling cross-job TQ sharing (e.g., a teacher server job creates TQ, and a trainer job connects to it). This change is backward-compatible: single-job usage is unaffected since the namespace is consistent between creation and discovery. Signed-off-by: huniu20 <huniumail@gmail.com>
9af6f55 to
b20656a
Compare
CLA Signature Passhuniu20, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
When multiple Ray Jobs share the same Ray cluster, Named Actors are isolated by namespace. Without an explicit namespace, a TQ Controller created by one job is invisible to workers in another job.
This commit adds namespace="transfer_queue" to both:
This ensures that the TQ Controller is always registered and discovered in the fixed "transfer_queue" namespace, enabling cross-job TQ sharing (e.g., a teacher server job creates TQ, and a trainer job connects to it).
This change is backward-compatible: single-job usage is unaffected since the namespace is consistent between creation and discovery.