ci(cli): run CLI unit tests in ut-runtime-1gpu#129
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d4e7b89fef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| torch==2.11.0 torchvision | ||
| # tokenspeed-kernel falls back to a pure-Python stub when CUDA is | ||
| # absent (see tokenspeed-kernel/python/setup.py). | ||
| pip install -e tokenspeed-kernel/python/ |
There was a problem hiding this comment.
Avoid installing GPU kernel package in CPU-only CLI job
This workflow installs tokenspeed-kernel on ubuntu-latest, but the package setup currently computes dependencies via tokenspeed-kernel/python/setup.py::_selected_backend() and raises when neither CUDA nor ROCm is detected (RuntimeError at setup time). On a standard hosted Ubuntu runner, that makes pip install -e tokenspeed-kernel/python/ fail before pytest runs, so the new CLI workflow will go red for reasons unrelated to CLI changes. For this CPU-only suite, skip this install or force a backend explicitly.
Useful? React with 👍 / 👎.
| branches: [main] | ||
| paths: | ||
| - "python/tokenspeed/cli/**" | ||
| - "python/tokenspeed/runtime/utils/server_args.py" |
There was a problem hiding this comment.
Expand CLI workflow path filters to cover serve_smg deps
The path filters only include python/tokenspeed/runtime/utils/server_args.py from runtime utils, but tokenspeed.cli.serve_smg also imports tokenspeed.runtime.utils.network and tokenspeed.runtime.utils.process (see python/tokenspeed/cli/serve_smg.py). A PR that breaks those modules can still break CLI behavior/tests while this workflow never triggers, which leaves a gap in the intended CLI regression coverage.
Useful? React with 👍 / 👎.
d4e7b89 to
cad92df
Compare
ut-cli on b200-1gpu
Fold `pytest test/cli` into the existing `ut-runtime-1gpu` task so the CLI orchestrator surface (argv splitter, banner, log prefix, proc helpers, dispatch) is exercised on every per-commit run alongside the runtime suite. CLI tests transitively import `ServerArgs`, which pulls `triton`, `flashinfer-python`, and `tokenspeed_kernel`; piggybacking on the GPU runner that already installs those deps avoids spinning up a separate workflow with a duplicate install path. The orchestrator timeout default was bumped from 600s to 1800s in #105 without updating `test_orchestrator_default_timeouts`; refresh the expected value so the suite is green before turning the task on. Signed-off-by: zhyncs <46627482+zhyncs@users.noreply.github.com>
cad92df to
e82e063
Compare
ut-cli on b200-1gpuut-runtime-1gpu
…ightseekorg#131) Squash-rebase of ``codex/ds4-sm12x-poc`` onto ``upstream/main`` at ``dd9866f`` (Refine third-party attribution notices, lightseekorg#131). Picks up nine upstream commits: * ``dd9866f`` Refine third-party attribution notices (lightseekorg#131) * ``b6c4617`` feat(cli): disable smg circuit breaker and retries by default (lightseekorg#130) * ``f55fd2a`` feat(cli): accept positional model arg in ``ts serve`` (lightseekorg#128) * ``6333e23`` ci(cli): run CLI unit tests in ``ut-runtime-1gpu`` (lightseekorg#129) * ``db7cae6`` feat(cli): print TokenSpeed banner on ``ts serve`` startup (lightseekorg#127) * ``361eb09`` perf(K2.5): Optimize lm_head (lightseekorg#126) * ``c2299fd`` perf: optimize flashinfer sampling backend (lightseekorg#105) * ``962b83a`` perf(K2.5): enable AR-Norm fusion and fused FP8 decode for MLA Eagle3 (lightseekorg#124) * ``4da7a1c`` fix to use max_num_pages for spec-decode topk page_table buffers (lightseekorg#125) Fork delta replayed on top of the new base: 82 files changed, +22833 / -373. Conflict resolution: * ``python/tokenspeed/runtime/sampling/backends/flashinfer_full.py`` imports — took upstream's wider import block from lightseekorg#105 (added ``top_k_top_p_renorm_torch`` and ``write_output_top_logprobs``); references on lines 333 and 471 require them. Pre-rebase state preserved at branch ``codex/ds4-sm12x-poc-prerebase- 20260514`` for safety; previous round's backup ``codex/ds4-sm12x-poc-prerebase`` still tracks remote. Fork-specific work carried forward in this squash: * SM12x sparse-MLA + DSv4-Flash output projection + MXFP4 MoE kernels (``tokenspeed-kernel/python/tokenspeed_kernel/thirdparty/cuda/``). * DSv4-Flash runtime model + attention backend + per-kind K-split sparse MLA + indexer ds4-decode shortcut. * V2 Stage 1 attention aux-stream (post-projection overlap on SM12x). * Bench tool ``await_with_per_request_timeout`` + ``sock_read=120`` / ``sock_connect=30`` + ``test_bench_timeout.py``. * All experiment archives + failed-attempts log in ``docs/notes/``. Test plan: * AST-parse sanity on the 63 staged Python files: clean. * Pre-commit + workstation rebuild + sanity test sweep to follow on a separate turn before pushing. Signed-off-by: jasl <jasl9187@hotmail.com>
Summary
python3 -m pytest test/cli -vinto the existingut-runtime-1gputask so the CLI orchestrator surface (argv splitter, banner, log prefix, proc helpers, dispatch) is exercised on every per-commit run alongside the runtime suite — no new workflow, no duplicate install path.ubuntu-latest:_engine_recognized_flags()lazy-importsServerArgs, which transitively pullstriton,flashinfer-python, andtokenspeed_kernel. Those need a CUDA build at install time, so a CPU-only runner can't even finishinstall_deps.sh. The existing 1-gpu runner already has the full stack staged.test_orchestrator_default_timeoutsalong the way. The orchestrator default was bumped from 600s to 1800s in perf: optimize flashinfer sampling backend #105 but the assertion was left at 600; refresh it so the suite is green when the task runs.Test plan
pre-commit run --all-filespytest test/cli -vlocally — 57 passed, 1 skipped (the SMG integration test gracefullyimportorskipssmg/smg_grpc_proto, which are not installed in the dev env).ut-runtime-1gpu / b200-1gpuruns the CLI block via the per-commit trigger on this PR; merge gated on it being green.