Skip to content

feat(routing): unified structured Selected-worker INFO log across all router modes#10554

Open
nnshah1 wants to merge 3 commits into
mainfrom
neelays/router-select-log
Open

feat(routing): unified structured Selected-worker INFO log across all router modes#10554
nnshah1 wants to merge 3 commits into
mainfrom
neelays/router-select-log

Conversation

@nnshah1

@nnshah1 nnshah1 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Every router-mode worker selection now emits one consistent, structured, info-level "Selected worker" log, so routing decisions are visible without flipping DYN_LOG. Previously the modes were inconsistent — RoundRobin/Random/LeastLoaded logged at trace, P2C at debug, DeviceAwareWeighted and KV at info but with ad-hoc fields.

Unified schema

info!(router_mode = "<mode>", worker_id = <id>, candidate_count = <n>, load = <load-if-applicable>, … , "Selected worker: …")

Per mode:

  • RoundRobin / Random / LeastLoaded — promoted trace!info! with structured fields (LL also logs load).
  • PowerOfTwoChoicesdebug!info!, retaining the candidate detail (candidate_a/b, loads); logs once per selection including the single-candidate path.
  • DeviceAwareWeighted — re-fielded to the common schema, keeping endpoint / is_cpu.
  • KV — kept the existing info! messages, added structured fields (router_mode, worker_id, worker_type, dp_rank, logit, block counts).
  • Direct — added an info! at dispatch (router_mode="direct", worker_id; no candidate_count/load since the worker is caller-supplied).

Log aggregators can now grep/filter on router_mode / worker_id / load uniformly across all modes for imbalance analysis.

Test

cargo check / fmt / clippy clean on dynamo-runtime + dynamo-kv-router.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Chores
    • Enhanced observability and structured logging for worker selection in router components, providing more detailed metrics and context for monitoring and troubleshooting purposes.

… router modes

Signed-off-by: nnshah1 <neelays@nvidia.com>
@nnshah1 nnshah1 requested a review from a team as a code owner June 10, 2026 20:05
@github-actions github-actions Bot added the feat label Jun 10, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2a911263-9f26-4873-af5c-bed9765deb03

📥 Commits

Reviewing files that changed from the base of the PR and between a95e0b1 and 1084d0f.

📒 Files selected for processing (2)
  • lib/kv-router/src/scheduling/selector.rs
  • lib/runtime/src/pipeline/network/egress/push_router.rs

Walkthrough

This PR enhances observability in router worker selection by converting formatted log messages to structured tracing::info! logs across KV and Push routers. Changes add explicit fields for worker identity, load metrics, and selection strategy details while elevating logging from trace/debug to info level for better visibility.

Changes

Router Selection Observability

Layer / File(s) Summary
KV Router selection logging
lib/kv-router/src/scheduling/selector.rs
DefaultWorkerSelector::select_worker now emits structured tracing::info! logs in both decode and non-decode paths with router_mode = "kv" and explicit fields for worker identity (worker_id, worker_type, dp_rank, logit) and cache metrics (host_pinned_blocks, disk_blocks, effective_cached_blocks).
Push Router selection logging
lib/runtime/src/pipeline/network/egress/push_router.rs
All worker selection methods (p2c_select_from single/multi-candidate, round_robin, random, direct, device_aware_weighted, least_loaded) now emit consistent tracing::info! "Selected worker" logs (elevated from prior trace level) with explicit fields for selected worker id, candidate count, and current load.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the change summary and details but lacks the required 'Related Issues' section with either a linked issue or confirmation of no related issue. Add the 'Related Issues' section from the template; either link to a related issue with 'Closes #XXXX' or check the 'Confirmed — no related issue' box.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: unified structured logging for worker selection across all router modes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

…red field

Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant