[Design] Etha as a vLLM weight-transfer engine backend

## Goal

Land Etha as a **first-class vLLM weight-transfer backend**, not just an externally-driven example. Today the integration is one-way and example-shaped:

- `examples/vllm_weight_sync` drives vLLM through `collective_rpc("setup_tensorbus")` + `collective_rpc("receive_weights")` from outside.
- vLLM has no built-in concept of "Etha as a weight source"; the example monkey-patches in custom RPC names.

Goal is for vLLM to recognize Etha as a registered weight-transfer engine backend so RL frameworks (and others) don't have to ship their own `collective_rpc` glue.

## Motivation

1. **Production RL needs this contract stable.** Right now every RL framework rolls its own version of the `examples/vllm_weight_sync/` glue. Pulling it into a vLLM-side abstraction means one converter library, one lifecycle, one error contract.
2. **vLLM-driven placement discovery is already the right design** (see `transport._get_placements` in [#87]). It belongs in vLLM, not in user code that walks vLLM internals.
3. **Avoids the `collective_rpc` hack.** The example only works because vLLM lets us call arbitrary `collective_rpc` names. That's not an API contract.

## Proposed shape (sketch — needs design pass)

Hand vLLM an `EngineBackend` protocol that owns the receive-side lifecycle:

```python
# inside vLLM, registered like quant backends are today
class WeightTransferBackend(Protocol):
    def setup(self, model: nn.Module, mesh_info: MeshInfo) -> None: ...
    def receive(self, weight_version: int) -> None: ...
    def teardown(self) -> None: ...

# Etha provides:
class EthaWeightTransferBackend(WeightTransferBackend):
    # init_pair once, register_tensors per round, drive transport
    ...
```

vLLM exposes a single endpoint (HTTP or RPC) `POST /weights/sync {version}` instead of forcing callers to know about `collective_rpc`. Backend selection via vLLM config (`weight_transfer_backend: "etha"`).

Open design questions:
- Where does the HF ↔ vLLM converter live — vLLM side (model-specific knowledge already there) or Etha side (current example)?
- Sync vs async receive — does the engine pause forward, or double-buffer?
- Failure mode if peer trainer side is down — fail receive cleanly without taking down inference?

## Dependencies

- Blocked on: vLLM engine backend extension point (does this exist today? if not, separate upstream issue needed against vllm-project/vllm).
- Reference example: #87 (this is the user-space prototype that proves the placement-discovery and converter pattern).

## Non-Goals

- This issue is **not** a quick refactor of the example. The example stays as the demo; this issue tracks the design + upstream conversation needed to make Etha a registered backend.
- Not solving multi-vendor weight-transfer (Etha-specific for now; the protocol stays generic so others can implement).

## Acceptance Criteria (open)

- [ ] Design doc: protocol surface, lifecycle, failure model
- [ ] Decision: converter location (vLLM-side vs Etha-side)
- [ ] Upstream vLLM RFC or maintainer thread confirming extension point shape
- [ ] Etha-side prototype implementing the protocol against an example vLLM build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Design] Etha as a vLLM weight-transfer engine backend #88

Goal

Motivation

Proposed shape (sketch — needs design pass)

Dependencies

Non-Goals

Acceptance Criteria (open)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Design] Etha as a vLLM weight-transfer engine backend #88

Description

Goal

Motivation

Proposed shape (sketch — needs design pass)

Dependencies

Non-Goals

Acceptance Criteria (open)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions