Add StickySessionRoutingStrategy for GeneratorRouter by pzhan9 · Pull Request #3625 · pytorch/torchtitan

pzhan9 · 2026-06-10T21:21:06Z

As title.

tianyu-l · 2026-06-10T22:12:07Z

+        if config.max_sessions <= 0:
+            raise ValueError(
+                f"max_sessions must be positive, got {config.max_sessions}"
+            )


should go in post_init

tianyu-l · 2026-06-10T22:14:20Z

+        if len(self._sessions) > self._max_sessions:
+            self._sessions.popitem(last=False)


add to TODO to log this using structured logger @felipemello1

maybe logger.warning for now?

tianyu-l · 2026-06-10T22:17:59Z

    estimated_cost: int = 1
    """Estimated request cost used by load-aware routing strategies."""

+    session_id: str | None = None


where are you assigning this to a context?

also this is leaking unnecessary info to non-stick routing strategies

tianyu-l · 2026-06-10T22:19:43Z

+        if sticky_generator is not None and any(
+            h is sticky_generator for h in candidates
+        ):
+            self._sessions.move_to_end(routing_ctx.session_id)
+            return sticky_generator


It seems if a generator is in weight-sync, you'd choose a new session, instead of wait until this generator's weight-sync finishes.

Do we know if this trade-off is worth it, especially for extreme long horizon rollout?

tianyu-l · 2026-06-10T22:20:14Z

+
+        chosen = self._fallback_strategy.choose(routing_ctx, candidates)
+        self._sessions[routing_ctx.session_id] = chosen
+        self._sessions.move_to_end(routing_ctx.session_id)


please add more comments on the strategy here

tianyu-l · 2026-06-10T22:23:20Z

        return min(candidates, key=lambda h: h.reserved_load)


+class StickySessionRoutingStrategy(RoutingStrategy):


The logic around assigning context.session_id is not clear, as it's not used anywhere https://github.com/pytorch/torchtitan/blob/main/torchtitan/experiments/rl/trainer.py#L592

Most importantly, we should make sure all the same prompts from a single GRPO group can route to the same generator.

tianyu-l · 2026-06-10T22:24:46Z

please enhance the multi-generator CI test with this feature

…g seam; simplify AlphabetSort example - rollout/types.py: GenerateFn is now a Protocol with an explicit signature (prompt_token_ids/request_id/session_id/sampling_config -> Completion|None), not a loose Callable. - session_id seam: run_single_rollout passes a stable per-rollout session_id (sticky-routing key) plus a per-turn request_id, threaded rollouter -> generate_fn -> generator.generate. A single generator ignores session_id; ready for the multi-generator router (pytorch#3583/pytorch#3625). - run_single_rollout takes rollout_id (built in run_group_rollouts) instead of sample_idx. - examples/alphabet_sort/env.py: fixed format example ending in "..." (dropped the randomized placeholder-row machinery); restored the original docstrings. - docstring/comment cleanups (plain wording).

Add sticky-session generator routing

97a57c7

pytorch-bot Bot added ciflow/8gpu ciflow/rl labels Jun 10, 2026

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 10, 2026

tianyu-l requested changes Jun 10, 2026

View reviewed changes

felipemello1 mentioned this pull request Jun 11, 2026

[rl] continuous-batching generator + multi-turn rollouts #3593

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add StickySessionRoutingStrategy for GeneratorRouter#3625

Add StickySessionRoutingStrategy for GeneratorRouter#3625
pzhan9 wants to merge 1 commit into
pytorch:mainfrom
pzhan9:pzhang/sticky-session-routing

pzhan9 commented Jun 10, 2026

Uh oh!

tianyu-l Jun 10, 2026

Uh oh!

tianyu-l Jun 10, 2026

Uh oh!

tianyu-l Jun 10, 2026

Uh oh!

tianyu-l Jun 10, 2026

Uh oh!

tianyu-l Jun 10, 2026

Uh oh!

tianyu-l Jun 10, 2026

Uh oh!

tianyu-l Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if len(self._sessions) > self._max_sessions:
		self._sessions.popitem(last=False)

		return min(candidates, key=lambda h: h.reserved_load)


		class StickySessionRoutingStrategy(RoutingStrategy):

Conversation

pzhan9 commented Jun 10, 2026

Uh oh!

tianyu-l Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants