lightseekorg · yubofredwang · May 22, 2026 · May 21, 2026
diff --git a/configs/README.md b/configs/README.md
@@ -37,6 +37,51 @@ python -m torchspec.train_entry --config configs/sglang_qwen3_8b.yaml training.l
 | `inference.sglang` | `tp_size`, `mem_fraction_static`, `extra_args` | SGLang engine settings (nested under inference) |
 | `mooncake` | `protocol`, `device_name` | Mooncake transfer engine settings |
 
+## Custom Ray placement
+
+Use `training.placement_strategy: custom` when training and inference must run
+on explicitly chosen Ray nodes. This is useful when the default `PACK` placement
+would put actors on nodes with the wrong network locality, cache state, or GPU
+partition.
+
+IP-based placement uses Ray's built-in `node:<ip>` resource and does not require
+custom Ray labels:
+
+```yaml
+training:
+  placement_strategy: custom
+  training_num_nodes: 1
+  training_num_gpus_per_node: 8
+  training_node_ips:
+    - 10.0.0.1
+
+inference:
+  inference_num_gpus: 16
+  inference_num_gpus_per_node: 8
+  inference_node_ips:
+    - 10.0.0.2
+    - 10.0.0.3
+```
+
+Ray label selectors are also supported when your Ray version supports placement
+group `bundle_label_selector`:
+
+```yaml
+training:
+  placement_strategy: custom
+  training_node_selectors:
+    - {"torchspec/node": "trainer-0"}
+
+inference:
+  inference_node_selectors:
+    - {"torchspec/node": "infer-0"}
+    - {"torchspec/node": "infer-1"}
+```
+
+For each role, set either `*_node_ips` or `*_node_selectors`, not both. The
+configured node order is preserved; for multi-node inference it determines the
+engine actor order and therefore the `node_rank` passed to SGLang or vLLM.
+
 ## SGLang engine configuration
 
 SGLang settings live under `inference.sglang` and are split into two tiers:

diff --git a/docs/code_architecture.md b/docs/code_architecture.md
@@ -12,7 +12,7 @@ torchspec/
 ├── ray/                     # Ray infrastructure (shared across all packages)
 │   ├── ray_actor.py         #   RayActor base class (GPU setup, network utils)
 │   ├── train_group.py       #   RayTrainGroup (manages training actor group)
-│   └── placement_group.py   #   Placement group creation, GPU resource management
+│   └── placement_group.py   #   Placement group creation, GPU resource management, custom node placement
 ├── controller/              # Async pipeline orchestration
 │   ├── training_controller.py  # AsyncTrainingController (Ray actor)
 │   ├── inference_manager.py    # AsyncInferenceManager (Ray actor)
@@ -209,11 +209,14 @@ training:
   ttt_length: 7                   # Speculative depth
   train_backend: fsdp
   fsdp_strategy: REPLICATE
+  placement_strategy: training_first  # or inference_first/custom
+  training_node_ips: null             # custom placement only
 
 inference:
   inference_engine_type: hf       # or "sgl"
   inference_batch_size: 1
   inference_num_gpus: 4
+  inference_node_ips: null        # custom placement only
   sglang:                         # nested under inference
     tp_size: 8
     extra_args:                   # power-user passthrough to sgl.Engine
@@ -258,7 +261,7 @@ python train.py --config base.yaml --config experiment.yaml training.learning_ra
 |--------|---------|
 | `torchspec/ray/ray_actor.py` | `RayActor` base class (GPU setup, IP/port utils, master addr negotiation) |
 | `torchspec/ray/train_group.py` | `RayTrainGroup` - Manages a group of training actors |
-| `torchspec/ray/placement_group.py` | Placement group creation, GPU resource waiting, `create_placement_groups()`, `create_train_group()` |
+| `torchspec/ray/placement_group.py` | Placement group creation, GPU resource waiting, custom node placement, `create_placement_groups()`, `create_train_group()` |
 
 ### Controller
 

diff --git a/docs/ray.md b/docs/ray.md
@@ -35,12 +35,13 @@ Placement groups reserve GPUs for training and inference as a unit and place the
 
 | Mode | Training GPUs | Inference GPUs | Use case |
 |------|--------------|----------------|----------|
-| Default (separate) | Dedicated PG | Dedicated PG | Production: no GPU contention |
+| Default | Sliced from unified PG | Sliced from unified PG | Production: deterministic node-to-role assignment |
+| `custom` | Sliced from custom unified PG | Sliced from custom unified PG | Production: explicit node choice with the same unified reservation semantics |
 | `colocate` | Shared PG | Shared PG | Dev: share GPUs between train & inference |
 | `debug_train_only` | Dedicated PG | Empty | Debug training without inference |
 | `debug_inference_only` | Empty | Dedicated PG | Debug inference without training |
 
-Each placement group probes bundles with a temporary `InfoActor` to discover the actual (node IP, GPU ID) mapping, then sorts by (node, GPU ID) for deterministic ordering.
+Each placement group probes bundles with a temporary `InfoActor` to discover the actual (node IP, GPU ID) mapping, then sorts by (node, GPU ID) for deterministic ordering. In `custom` mode, TorchSpec sorts by the configured node order first and by physical GPU ID within each selected node.
 
 ## Ray Cluster Setup
 
@@ -134,6 +135,65 @@ The PACK placement strategy spreads them across nodes automatically.
 | `training.training_num_nodes` | 1 | Number of training nodes |
 | `training.training_num_gpus_per_node` | 1 | GPUs per training node |
 
+### Custom node placement
+
+By default, TorchSpec creates a unified placement group with Ray's `PACK`
+strategy, probes the resulting bundles, and assigns the ordered bundles to
+training or inference according to `training.placement_strategy`
+(`training_first` or `inference_first`). Set
+`training.placement_strategy: custom` to explicitly choose the nodes for each
+role while still reserving the non-colocated training and inference bundles in a
+single unified placement group.
+
+IP-based placement uses Ray's per-node resource labels (`node:<ip>`) and does
+not require custom Ray labels:
+
+```yaml
+training:
+  placement_strategy: custom
+  training_num_nodes: 2
+  training_num_gpus_per_node: 8
+  training_node_ips:
+    - 10.0.0.1
+    - 10.0.0.3
+
+inference:
+  inference_num_gpus: 16
+  inference_num_gpus_per_node: 8
+  inference_node_ips:
+    - 10.0.0.2
+    - 10.0.0.4
+```
+
+Ray label selectors are also supported when the installed Ray version supports
+placement group `bundle_label_selector`. Start Ray nodes with labels, then use
+matching selectors in the config:
+
+```yaml
+training:
+  placement_strategy: custom
+  training_num_nodes: 2
+  training_num_gpus_per_node: 8
+  training_node_selectors:
+    - {"torchspec/node": "trainer-0"}
+    - {"torchspec/node": "trainer-1"}
+
+inference:
+  inference_node_selectors:
+    - {"torchspec/node": "infer-0"}
+    - {"torchspec/node": "infer-1"}
+```
+
+The configured node order is preserved. For multi-node inference, this order
+determines the order of inference engine actors and therefore the `node_rank`
+passed to SGLang or vLLM. Within each selected node, bundles are ordered by the
+actual GPU ID discovered by `InfoActor`.
+
+The number of configured training nodes must equal
+`training.training_num_nodes`. The number of configured inference nodes must
+match `ceil(inference.inference_num_gpus / inference.inference_num_gpus_per_node)`.
+For each role, set only one of `*_node_ips` or `*_node_selectors`.
+
 ### Inference across nodes (SglEngine multi-node TP)
 
 When a single model is too large for one node, SglEngine supports multi-node