Abort in-flight requests when the client disconnects by merceod · Pull Request #136 · mstar-project/mstar

merceod · 2026-06-20T12:03:02Z

Propagate client cancellation from the API server to the conductor, which removes the request from its workers and frees KV-cache pages and the concurrency slot instead of running cancelled requests to completion.

What does this PR do?

Closes Issue #135.

When a client cancels a request, M* now aborts it instead of running it to EOS/max_tokens. The API server detects the disconnect - streaming via the response generator's teardown, non-streaming via Request.is_disconnected() - and signals the conductor, which removes the request from its workers and admits the next queued request. The worker-side teardown is unchanged; this adds the missing trigger.

How was it tested?

ruff check . and test/modular/test_openai_router.py pass.
Live Orpheus server with max_concurrent_requests=1: a cancelled request previously held the only slot until natural EOS (~7s); now the slot frees within ~2ms of disconnect and the next request starts immediately - verified for streaming, non-streaming, and timeout paths.
Normal outputs unchanged across Orpheus (audio), BAGEL (text + image), Pi0.5 (actions), and V-JEPA 2 (rollout).

Checklist

ruff check . passes
Added or updated tests / docs where relevant

Propagate client cancellation from the API server to the conductor, which removes the request from its workers and frees KV-cache pages and the concurrency slot instead of running cancelled requests to completion.

NSagan271

Looks good to me (just some nitpicks). I'd do some more testing with removes to make sure that our worker / speculation path is robust to in-flight removal (since REMOVE_REQUEST has only previously been sent after all workers are done).

NSagan271 · 2026-06-20T17:37:28Z

+        finished = False
+        try:
+            while True:
+                if time.time() - start > self.timeout_seconds:


Probably for another PR (I can raise an issue), but I think it could be worthwhile to be able to raise/lower timeout_seconds per request? e.g., to fail fast if we know the request should be short, or to bump it up for a longer request. Thoughts?

NSagan271 · 2026-06-20T17:43:00Z

+        if not active:
+            return
+        logger.info("Client cancelled request %s; releasing resources", request_id)
+        self.preprocess_worker.abort_request(request_id)


nit: is it cleaner to have preprocess_worker.abort_request also do the cleanup path, instead of the caller having to remember to call both?

NSagan271 · 2026-06-20T17:51:50Z

+                return
+
+        request_data = self.requests.get(request_id)
+        if request_data is None:


nit: have a logger.info / warning here

* fix race condition * Guard schedule of removed rids

Abort in-flight requests when the client disconnects

6e66ea4

Propagate client cancellation from the API server to the conductor, which removes the request from its workers and frees KV-cache pages and the concurrency slot instead of running cancelled requests to completion.

merceod requested review from NSagan271, kamahori, sivginirmak and zhudianGG June 20, 2026 12:03

merceod mentioned this pull request Jun 20, 2026

Support request abortion from client cancelation #135

Open

NSagan271 reviewed Jun 20, 2026

View reviewed changes

NSagan271 and others added 4 commits June 20, 2026 13:07

fix race condition (#137)

6516808

* fix race condition * Guard schedule of removed rids

Merge main into fix/cancelled-request-gpu-leak

9a636c7

Ack abandoned result tensors so the producing worker can reclaim them

b19983b

Avoid infinite loop replaying buffered messages for a removed request

bb14efc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abort in-flight requests when the client disconnects#136

Abort in-flight requests when the client disconnects#136
merceod wants to merge 5 commits into
mainfrom
fix/cancelled-request-gpu-leak

merceod commented Jun 20, 2026

Uh oh!

NSagan271 left a comment

Uh oh!

NSagan271 Jun 20, 2026

Uh oh!

NSagan271 Jun 20, 2026

Uh oh!

NSagan271 Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

merceod commented Jun 20, 2026

What does this PR do?

How was it tested?

Checklist

Uh oh!

NSagan271 left a comment

Choose a reason for hiding this comment

Uh oh!

NSagan271 Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

NSagan271 Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

NSagan271 Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants