Skip to content

[ROCm][MoRI] WRITE mode support (layerwise xfer)#157

Merged
BugenZhao merged 32 commits into
vllm-project:mainfrom
simondanielsson:feature/mori-write-mode
May 6, 2026
Merged

[ROCm][MoRI] WRITE mode support (layerwise xfer)#157
BugenZhao merged 32 commits into
vllm-project:mainfrom
simondanielsson:feature/mori-write-mode

Conversation

@simondanielsson

@simondanielsson simondanielsson commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Purpose

Fixes #160.

This PR allows us to use WRITE mode of MoRI with vllm-router, enabling layer-wise KV transfer. It works by implementing concurrent dispatch of requests to P and D.

Dependent PR:

Results

  • vllm-router performance similar to toy proxy reference implementation. However, toy proxy starts failing at high concurrency. (green == orange)
  • Compared to READ mode, WRITE mode provides
    • up to 1.8x lower ⁠TTFT vs READ mode
    • significantly better TPOT
  • [Fix][MoRI] Align MoRI-IO message format with P2pNcclConnector and vllm-router vllm#39565 already showed READ mode performance of vllm-router is similar to that of toy proxy, so WRITE mode vllm-router should definitely outperform READ mode toy proxy.
benchmark_comparison

1P1D with DSR1. See details below for reproducer.

Usage

Usage is identical to as introduced in #138, i.e. simply use --kv-connector moriio with service discovery enabled. The transfer mode is automatically negotiated during router<->worker handshake. On vLLM side, run with VLLM_MORIIO_CONNECTOR_READ_MODE=0 (or omit this env var entirely as it's the default).

Implementation details

Click to expand

Layerwise transfer requires dispatching request to the P and D instances concurrently, rather than sequentially as during READ mode.

  • The current PD dispatching logic (i.e. READ mode) has four steps: (1) sending max_tokens=1 request to P, (2) awaiting response from P (3) sending request to D (4) stream response from D.
  • The dispatching logic for WRITE mode is rather: (1) Prepare P and D requests, with P request having max_tokens=1 (2) send requests to both P and D concurrently (3) stream response from D.

How WRITE mode works in vLLM: D allocates blocks and notifies P about these blocks and then waits. P receives this notification, and after every layer it writes its produced KV into these blocks asynchronously. Hence transfer or layer N is overlapped with computation of layer N+1. After all layers have been written, it awaits the last write and validates it succeeded. P then sends a notification to D that all blocks have been written. Upon this notification, D wakes up and and the request is scheduled on the D side, iteratively performing decode steps using the KV.

Test Plan

  1. Bench: Compare READ vs WRITE mode, vllm-router vs toy-proxy
  2. GSM8k for correctness validation

Reproducer

Build router on this branch

docker build \
  -f Dockerfile.router \
  -t ghcr.io/simondanielsson/vllm-router:write-mode \
  .
# or alternatively docker pull ghcr.io/simondanielsson/vllm-router:write-mode

Need vllm built on this branch to avoid hanging issues: vllm-project/vllm#40344:

docker pull ghcr.io/simondanielsson/vllm-rocm-moriio:dev-hang-fixes
# or nightly vllm docker once https://github.com/vllm-project/vllm/pull/40344 merged

Run 1P1D with WRITE mode (VLLM_MORIIO_CONNECTOR_READ_MODE=0):

# Set on both nodes before running any command
export PREFILL_IP=<set this>
export DECODE_IP=<set this>

# Node 1 (prefill node) — command 1: start vllm router
docker run \
  --name vllm-router \
  --network host \
  --rm \
  ghcr.io/simondanielsson/vllm-router:write-mode \
  vllm-router \
  --vllm-pd-disaggregation \
  --kv-connector moriio \
  --vllm-discovery-address "0.0.0.0:36367" \
  --policy consistent_hash \
  --prefill-policy consistent_hash \
  --decode-policy consistent_hash \
  --log-level info


# Node 1 (prefill node) — command 2: start prefill instance
docker run \
  --rm \
  --name moriio-prefill \
  --init --network host --ipc host --privileged \
  --cap-add SYS_PTRACE --security-opt seccomp=unconfined \
  --ulimit memlock=-1 --ulimit stack=67108864 \
  --shm-size 256G \
  --group-add video --group-add render \
  --device /dev/kfd --device /dev/dri --device /dev/infiniband \
  -v /sys:/sys \
  -v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
  -e HF_HOME=/root/.cache/huggingface \
  -e HF_HUB_ENABLE_HF_TRANSFER=0 \
  -e VLLM_MORIIO_CONNECTOR_READ_MODE=0 \
  -e NCCL_MIN_NCHANNELS=112 \
  -e VLLM_USE_V1=1 \
  -e VLLM_ENGINE_READY_TIMEOUT_S=3600 \
  -e VLLM_SERVER_DEV_MODE=1 \
  -e VLLM_ROCM_USE_AITER=1 \
  -e VLLM_ROCM_USE_AITER_PAGED_ATTN=0 \
  -e VLLM_ROCM_USE_AITER_RMSNORM=1 \
  -e VLLM_USE_AITER_TRITON_SILU_MUL=0 \
  -e MORI_IO_ENABLE_NOTIFICATION=0 \
  -e NCCL_SOCKET_IFNAME=ens51np0 \
  ghcr.io/simondanielsson/vllm-rocm-moriio:dev-hang-fixes \
  deepseek-ai/DeepSeek-R1-0528 \
    --load-format dummy \
    --port 8100 \
    --tensor-parallel-size 8 \
    --kv-cache-dtype fp8 \
    --gpu-memory-utilization 0.7 \
    --max-num-batched-tokens 32768 \
    --max-model-len 16384 \
    --trust-remote-code \
    --no-enable-prefix-caching \
    --block-size 1 \
    --enforce-eager \
    --kv-transfer-config '{
      "kv_connector": "MoRIIOConnector",
      "kv_role": "kv_producer",
      "kv_connector_extra_config": {
        "proxy_ip": "'"${PREFILL_IP}"'",
        "proxy_ping_port": "36367",
        "http_port": "8100",
        "handshake_port": "6301",
        "notify_port": "61005"
      }
    }'

# Node 2 (decode node) — command 3: start decode instance
docker run \
  --rm \
  --name moriio-decode \
  --init --network host --ipc host --privileged \
  --cap-add SYS_PTRACE --security-opt seccomp=unconfined \
  --ulimit memlock=-1 --ulimit stack=67108864 \
  --shm-size 256G \
  --group-add video --group-add render \
  --device /dev/kfd --device /dev/dri --device /dev/infiniband \
  -v /sys:/sys \
  -v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
  -e HF_HOME=/root/.cache/huggingface \
  -e HF_HUB_ENABLE_HF_TRANSFER=0 \
  -e VLLM_MORIIO_CONNECTOR_READ_MODE=0 \
  -e NCCL_MIN_NCHANNELS=112 \
  -e VLLM_USE_V1=1 \
  -e VLLM_ENGINE_READY_TIMEOUT_S=3600 \
  -e VLLM_SERVER_DEV_MODE=1 \
  -e VLLM_ROCM_USE_AITER=1 \
  -e VLLM_ROCM_USE_AITER_PAGED_ATTN=0 \
  -e VLLM_ROCM_USE_AITER_RMSNORM=1 \
  -e VLLM_USE_AITER_TRITON_SILU_MUL=0 \
  -e MORI_IO_ENABLE_NOTIFICATION=0 \
  ghcr.io/simondanielsson/vllm-rocm-moriio:dev-hang-fixes \
  deepseek-ai/DeepSeek-R1-0528 \
    --port 8200 \
    --load-format dummy \
    --tensor-parallel-size 8 \
    --kv-cache-dtype fp8 \
    --gpu-memory-utilization 0.7 \
    --max-num-batched-tokens 32768 \
    --max-model-len 16384 \
    --trust-remote-code \
    --no-enable-prefix-caching \
    --block-size 1 \
    --enable-expert-parallel \
    --all2all-backend mori \
    --compilation-config '{"cudagraph_mode": "PIECEWISE"}' \
    --kv-transfer-config '{
      "kv_connector": "MoRIIOConnector",
      "kv_role": "kv_consumer",
      "kv_connector_extra_config": {
        "proxy_ip": "'"${PREFILL_IP}"'",
        "proxy_ping_port": "36367",
        "http_port": "8200",
        "handshake_port": "6301",
        "notify_port": "61005"
      }
    }'

# Node 1 (prefill node) — command 5: run vllm bench serve
for input_len in 1000 8000; do
  for concurrency in 16 32 64 128 256; do
    docker exec moriio-prefill \
      vllm bench serve \
        --base-url http://localhost:30000 \
        --backend vllm \
        --model deepseek-ai/DeepSeek-R1-0528 \
        --dataset-name random \
        --random-input-len $input_len \
        --random-output-len 1000 \
        --max-concurrency $concurrency \
        --num-warmups $((concurrency * 2)) \
        --num-prompts $((concurrency * 10)) \
        --goodput ttft:1000 \
        --seed 1234 \
    | tee bench_router_input${input_len}_concurrency${concurrency}.txt
  done
done

#  Node 1 (prefill node) - GSM8k.
# Remember to remove --load-format dummy on the vllm instances though
docker exec moriio-prefill bash -c \
  "pip install --quiet 'lm_eval[api]' && \
   lm_eval \
     --model local-completions \
     --model_args model=deepseek-ai/DeepSeek-R1-0528,base_url=http://localhost:30000/v1/completions,tokenized_requests=False,trust_remote_code=True \
     --tasks gsm8k \
     --num_fewshot 5 \
     --output_path /tmp/lm_eval_gsm8k" 

Note: you can also test the above with the toy proxy:

# Swap out the vllm-router container for the toy proxy
docker run \
  --rm \
  --name moriio-toy-proxy \
  --network host \
  --rm \
  --entrypoint bash \
  ghcr.io/simondanielsson/vllm-rocm-moriio:dev-hang-fixes \
  -c "pip install --quiet --ignore-installed quart aiohttp msgpack && \
           python3 -u /app/vllm/examples/online_serving/disaggregated_serving/moriio_toy_proxy_server.py"

# and re-run the benchmark but with the default 10001 port
for input_len in 1000 8000; do
  for concurrency in 16 32 64 128 256; do
    docker exec moriio-prefill \
      vllm bench serve \
        --base-url http://localhost:10001 \
        --backend vllm \
        --model deepseek-ai/DeepSeek-R1-0528 \
        --dataset-name random \
        --random-input-len $input_len \
        --random-output-len 1000 \
        --max-concurrency $concurrency \
        --num-warmups $((concurrency * 2)) \
        --num-prompts $((concurrency * 10)) \
        --goodput ttft:1000 \
        --seed 1234 \
    | tee bench_toy_proxy_input${input_len}_concurrency${concurrency}.txt
  done
done

Test Result

Main results summarized above. Raw results below.

WRITE mode: vllm-router vs toy proxy

Also includes vllm-router READ.

1k/1k

Router Mode Concurrency Failed Reqs Req/s TTFT (P50/P99) ms TPOT (P50/P99) ms ITL (P50/P99) ms
vllm-router READ 16 0 1.02 261.96 / 997.72 15.27 / 15.46 15.20 / 16.34
toy-proxy WRITE 16 0 1.03 170.03 / 708.39 15.14 / 15.29 15.13 / 16.25
vllm-router WRITE 16 0 1.03 238.92 / 678.49 15.20 / 15.75 15.18 / 16.54
vllm-router READ 32 0 2.02 243.10 / 1335.86 15.19 / 15.66 15.15 / 17.64
toy-proxy WRITE 32 0 2.04 242.21 / 1321.08 15.14 / 15.27 15.11 / 16.60
vllm-router WRITE 32 0 2.02 202.82 / 1268.53 15.28 / 15.70 15.23 / 16.68
vllm-router READ 64 0 3.60 791.34 / 3103.74 16.51 / 17.21 15.54 / 25.94
toy-proxy WRITE 64 0 3.90 605.63 / 2528.67 15.48 / 16.06 15.40 / 18.52
vllm-router WRITE 64 0 2.85 887.21 / 60563.53 15.54 / 15.78 15.49 / 18.05
vllm-router READ 128 0 4.71 2286.87 / 57920.53 19.82 / 20.98 18.21 / 35.46
toy-proxy WRITE 128 4 5.03 671.74 / 53810.47 18.23 / 18.53 18.19 / 26.61
vllm-router WRITE 128 0 5.05 949.18 / 61034.77 18.24 / 18.37 18.21 / 23.91
vllm-router READ 256 0 6.08 2857.77 / 64422.03 29.75 / 31.56 24.90 / 117.52
toy-proxy WRITE 256 33 6.81 1354.28 / 61913.80 24.99 / 25.78 24.46 / 65.69
vllm-router WRITE 256 0 6.65 1299.83 / 61447.22 25.40 / 25.63 25.10 / 56.03

8k/1k

Router Mode Concurrency Failed Reqs Req/s TTFT (P50/P99) ms TPOT (P50/P99) ms ITL (P50/P99) ms
vllm-router READ 16 0 0.63 2176.66 / 59528.47 17.55 / 18.55 15.40 / 22.41
toy-proxy WRITE 16 0 0.74 767.93 / 51578.94 15.19 / 15.35 15.15 / 21.64
vllm-router WRITE 16 1 0.71 766.99 / 61397.36 15.12 / 15.47 15.10 / 21.03
vllm-router READ 32 0 0.93 2930.00 / 63933.75 20.35 / 21.91 15.79 / 203.46
toy-proxy WRITE 32 0 1.09 1067.96 / 70475.83 15.87 / 16.70 15.90 / 24.46
vllm-router WRITE 32 0 1.08 790.78 / 70608.24 15.84 / 16.13 15.85 / 23.41
vllm-router READ 64 0 1.16 3098.27 / 66645.72 29.95 / 31.91 19.28 / 424.22
toy-proxy WRITE 64 0 1.37 5756.09 / 65180.50 17.99 / 18.95 18.01 / 31.69
vllm-router WRITE 64 0 1.36 6136.02 / 66064.27 17.86 / 18.94 17.96 / 30.50
vllm-router READ 128 0 1.30 68350.20 / 99389.23 42.07 / 50.25 25.19 / 455.45
toy-proxy WRITE 128 4 1.38 83536.72 / 99228.10 18.21 / 18.86 18.13 / 32.20
vllm-router WRITE 128 0 1.39 83549.72 / 99688.20 18.16 / 18.78 18.12 / 31.72

Accuracy (GSM8k)

WRITE mode, vllm-router:

local-completions ({'model': 'deepseek-ai/DeepSeek-R1-0528', 'base_url': 'http://localhost:30000/v1/completions', 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9538|±  |0.0058|
|     |       |strict-match    |     5|exact_match|↑  |0.9515|±  |0.0059|

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
… service discovery

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
…mode

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@simondanielsson simondanielsson changed the title [ROCm][MoRI] WRITE mode support [ROCm][MoRI] WRITE mode support (layerwise xfer) Apr 29, 2026
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
…s not support it anyways

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@simondanielsson simondanielsson marked this pull request as ready for review April 29, 2026 18:21
@simondanielsson

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3d623322a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/routers/http/vllm_pd_router.rs Outdated
Comment thread src/routers/http/vllm_pd_router.rs
@tjtanaa tjtanaa requested a review from BugenZhao April 30, 2026 01:33
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@simondanielsson

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a820dda17

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/config/validation.rs
Comment on lines +111 to +114
if moriio_transfer_mode.set(mode).is_err() {
let stored = moriio_transfer_mode.get().copied().unwrap();
if stored != mode {
warn!(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow transfer mode to change after instance turnover

The transfer mode is stored in a OnceLock and any later registration with a different mode is rejected forever. Since the value is never reset when old instances expire, a normal READ→WRITE (or WRITE→READ) worker rollout under a long-running router causes all new workers to be skipped; once old workers age out, service discovery can be left with no usable instances until the router is restarted.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think there's much of a reason to switch modes live. WRITE is the default and should perform better. Suggest we ignore this for now until someone needs this functionality as that would add a lot of complexity (e.g. handling the rollover period where instances of both modes are live)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b04a049743

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/routers/http/vllm_pd_router.rs Outdated
Comment on lines +1017 to +1018
let (prefill_result, decode_result) = tokio::join!(prefill_fut, decode_fut);
let write_prefill_response_json: Option<Value> = match prefill_result {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail fast on WRITE prefill errors instead of waiting on decode

In the WRITE path, tokio::join! waits for both futures before prefill_result is inspected, so a fast prefill failure is still delayed until the decode request returns. When decode is blocked waiting for KV transfer, this turns a clear prefill error into a long timeout path and ties up request capacity. Even with the new error propagation, this join! ordering still prevents prompt failure handling.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we could use tokio::select! or abort to cancel decode on prefill failure but that adds complexity. I would suggest ignoring now but happy to be challenged if disagreed upon.

Comment thread src/routers/http/vllm_service_discovery.rs Outdated
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d08cda7bb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/routers/http/vllm_pd_router.rs Outdated
Comment thread src/routers/http/vllm_pd_router.rs
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3dbd46e8f8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/routers/http/vllm_pd_router.rs
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
…ntead of WRITRE mode

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
…mode

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@simondanielsson

Copy link
Copy Markdown
Contributor Author

CI failing due to unrelated changes, even after rebasing on main. Suggest triggering re-run.

Locally it's happy:

$ cargo clippy --all-targets --all-features -- -D warnings
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.77s

@simondanielsson

Copy link
Copy Markdown
Contributor Author

Ci fixed by this: #169

…mode

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a5a448af9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/routers/http/vllm_pd_router.rs
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

@BugenZhao BugenZhao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

Comment thread src/routers/http/vllm_service_discovery.rs
Comment thread src/routers/http/vllm_service_discovery.rs Outdated
…ests

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
@simondanielsson

Copy link
Copy Markdown
Contributor Author

@BugenZhao Thanks for the review! Comments fixed, PTAL 🙏

@BugenZhao BugenZhao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@BugenZhao BugenZhao merged commit e667ebb into vllm-project:main May 6, 2026
6 checks passed
@simondanielsson simondanielsson deleted the feature/mori-write-mode branch May 6, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support MoRI with WRITE mode

2 participants