You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Land Etha as a first-class vLLM weight-transfer backend, not just an externally-driven example. Today the integration is one-way and example-shaped:
examples/vllm_weight_sync drives vLLM through collective_rpc("setup_tensorbus") + collective_rpc("receive_weights") from outside.
vLLM has no built-in concept of "Etha as a weight source"; the example monkey-patches in custom RPC names.
Goal is for vLLM to recognize Etha as a registered weight-transfer engine backend so RL frameworks (and others) don't have to ship their own collective_rpc glue.
Motivation
Production RL needs this contract stable. Right now every RL framework rolls its own version of the examples/vllm_weight_sync/ glue. Pulling it into a vLLM-side abstraction means one converter library, one lifecycle, one error contract.
Avoids the collective_rpc hack. The example only works because vLLM lets us call arbitrary collective_rpc names. That's not an API contract.
Proposed shape (sketch — needs design pass)
Hand vLLM an EngineBackend protocol that owns the receive-side lifecycle:
# inside vLLM, registered like quant backends are todayclassWeightTransferBackend(Protocol):
defsetup(self, model: nn.Module, mesh_info: MeshInfo) ->None: ...
defreceive(self, weight_version: int) ->None: ...
defteardown(self) ->None: ...
# Etha provides:classEthaWeightTransferBackend(WeightTransferBackend):
# init_pair once, register_tensors per round, drive transport
...
vLLM exposes a single endpoint (HTTP or RPC) POST /weights/sync {version} instead of forcing callers to know about collective_rpc. Backend selection via vLLM config (weight_transfer_backend: "etha").
Open design questions:
Where does the HF ↔ vLLM converter live — vLLM side (model-specific knowledge already there) or Etha side (current example)?
Sync vs async receive — does the engine pause forward, or double-buffer?
Failure mode if peer trainer side is down — fail receive cleanly without taking down inference?
Dependencies
Blocked on: vLLM engine backend extension point (does this exist today? if not, separate upstream issue needed against vllm-project/vllm).
This issue is not a quick refactor of the example. The example stays as the demo; this issue tracks the design + upstream conversation needed to make Etha a registered backend.
Not solving multi-vendor weight-transfer (Etha-specific for now; the protocol stays generic so others can implement).
Acceptance Criteria (open)
Design doc: protocol surface, lifecycle, failure model
Decision: converter location (vLLM-side vs Etha-side)
Upstream vLLM RFC or maintainer thread confirming extension point shape
Etha-side prototype implementing the protocol against an example vLLM build
Goal
Land Etha as a first-class vLLM weight-transfer backend, not just an externally-driven example. Today the integration is one-way and example-shaped:
examples/vllm_weight_syncdrives vLLM throughcollective_rpc("setup_tensorbus")+collective_rpc("receive_weights")from outside.Goal is for vLLM to recognize Etha as a registered weight-transfer engine backend so RL frameworks (and others) don't have to ship their own
collective_rpcglue.Motivation
examples/vllm_weight_sync/glue. Pulling it into a vLLM-side abstraction means one converter library, one lifecycle, one error contract.transport._get_placementsin [feat(examples): land vllm_weight_sync example (currently on examples/vllm-weight-sync branch) #87]). It belongs in vLLM, not in user code that walks vLLM internals.collective_rpchack. The example only works because vLLM lets us call arbitrarycollective_rpcnames. That's not an API contract.Proposed shape (sketch — needs design pass)
Hand vLLM an
EngineBackendprotocol that owns the receive-side lifecycle:vLLM exposes a single endpoint (HTTP or RPC)
POST /weights/sync {version}instead of forcing callers to know aboutcollective_rpc. Backend selection via vLLM config (weight_transfer_backend: "etha").Open design questions:
Dependencies
Non-Goals
Acceptance Criteria (open)