Skip to content

ci(cli): run CLI unit tests in ut-runtime-1gpu#129

Merged
lightseek-bot merged 1 commit into
mainfrom
zhyncs/cli-ci-test
May 13, 2026
Merged

ci(cli): run CLI unit tests in ut-runtime-1gpu#129
lightseek-bot merged 1 commit into
mainfrom
zhyncs/cli-ci-test

Conversation

@zhyncs
Copy link
Copy Markdown
Member

@zhyncs zhyncs commented May 13, 2026

Summary

  • Fold python3 -m pytest test/cli -v into the existing ut-runtime-1gpu task so the CLI orchestrator surface (argv splitter, banner, log prefix, proc helpers, dispatch) is exercised on every per-commit run alongside the runtime suite — no new workflow, no duplicate install path.
  • Why piggyback on a GPU runner instead of ubuntu-latest: _engine_recognized_flags() lazy-imports ServerArgs, which transitively pulls triton, flashinfer-python, and tokenspeed_kernel. Those need a CUDA build at install time, so a CPU-only runner can't even finish install_deps.sh. The existing 1-gpu runner already has the full stack staged.
  • Fix test_orchestrator_default_timeouts along the way. The orchestrator default was bumped from 600s to 1800s in perf: optimize flashinfer sampling backend #105 but the assertion was left at 600; refresh it so the suite is green when the task runs.

Test plan

  • pre-commit run --all-files
  • pytest test/cli -v locally — 57 passed, 1 skipped (the SMG integration test gracefully importorskips smg / smg_grpc_proto, which are not installed in the dev env).
  • ut-runtime-1gpu / b200-1gpu runs the CLI block via the per-commit trigger on this PR; merge gated on it being green.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4e7b89fef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/cli-test.yml Outdated
torch==2.11.0 torchvision
# tokenspeed-kernel falls back to a pure-Python stub when CUDA is
# absent (see tokenspeed-kernel/python/setup.py).
pip install -e tokenspeed-kernel/python/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid installing GPU kernel package in CPU-only CLI job

This workflow installs tokenspeed-kernel on ubuntu-latest, but the package setup currently computes dependencies via tokenspeed-kernel/python/setup.py::_selected_backend() and raises when neither CUDA nor ROCm is detected (RuntimeError at setup time). On a standard hosted Ubuntu runner, that makes pip install -e tokenspeed-kernel/python/ fail before pytest runs, so the new CLI workflow will go red for reasons unrelated to CLI changes. For this CPU-only suite, skip this install or force a backend explicitly.

Useful? React with 👍 / 👎.

Comment thread .github/workflows/cli-test.yml Outdated
branches: [main]
paths:
- "python/tokenspeed/cli/**"
- "python/tokenspeed/runtime/utils/server_args.py"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expand CLI workflow path filters to cover serve_smg deps

The path filters only include python/tokenspeed/runtime/utils/server_args.py from runtime utils, but tokenspeed.cli.serve_smg also imports tokenspeed.runtime.utils.network and tokenspeed.runtime.utils.process (see python/tokenspeed/cli/serve_smg.py). A PR that breaks those modules can still break CLI behavior/tests while this workflow never triggers, which leaves a gap in the intended CLI regression coverage.

Useful? React with 👍 / 👎.

@zhyncs zhyncs force-pushed the zhyncs/cli-ci-test branch from d4e7b89 to cad92df Compare May 13, 2026 19:16
@zhyncs zhyncs changed the title ci(cli): run CLI unit tests on ubuntu-latest ci(cli): register CLI unit tests under ut-cli on b200-1gpu May 13, 2026
Fold `pytest test/cli` into the existing `ut-runtime-1gpu` task so the
CLI orchestrator surface (argv splitter, banner, log prefix, proc
helpers, dispatch) is exercised on every per-commit run alongside the
runtime suite. CLI tests transitively import `ServerArgs`, which pulls
`triton`, `flashinfer-python`, and `tokenspeed_kernel`; piggybacking on
the GPU runner that already installs those deps avoids spinning up a
separate workflow with a duplicate install path.

The orchestrator timeout default was bumped from 600s to 1800s in #105
without updating `test_orchestrator_default_timeouts`; refresh the
expected value so the suite is green before turning the task on.

Signed-off-by: zhyncs <46627482+zhyncs@users.noreply.github.com>
@zhyncs zhyncs force-pushed the zhyncs/cli-ci-test branch from cad92df to e82e063 Compare May 13, 2026 19:20
@zhyncs zhyncs changed the title ci(cli): register CLI unit tests under ut-cli on b200-1gpu ci(cli): run CLI unit tests in ut-runtime-1gpu May 13, 2026
@lightseek-bot lightseek-bot merged commit 6333e23 into main May 13, 2026
2 of 3 checks passed
@lightseek-bot lightseek-bot deleted the zhyncs/cli-ci-test branch May 13, 2026 19:21
jasl added a commit to jasl/tokenspeed that referenced this pull request May 13, 2026
…ightseekorg#131)

Squash-rebase of ``codex/ds4-sm12x-poc`` onto ``upstream/main`` at
``dd9866f`` (Refine third-party attribution notices, lightseekorg#131). Picks up
nine upstream commits:

* ``dd9866f`` Refine third-party attribution notices (lightseekorg#131)
* ``b6c4617`` feat(cli): disable smg circuit breaker and retries by
  default (lightseekorg#130)
* ``f55fd2a`` feat(cli): accept positional model arg in ``ts serve``
  (lightseekorg#128)
* ``6333e23`` ci(cli): run CLI unit tests in ``ut-runtime-1gpu`` (lightseekorg#129)
* ``db7cae6`` feat(cli): print TokenSpeed banner on ``ts serve``
  startup (lightseekorg#127)
* ``361eb09`` perf(K2.5): Optimize lm_head (lightseekorg#126)
* ``c2299fd`` perf: optimize flashinfer sampling backend (lightseekorg#105)
* ``962b83a`` perf(K2.5): enable AR-Norm fusion and fused FP8 decode
  for MLA Eagle3 (lightseekorg#124)
* ``4da7a1c`` fix to use max_num_pages for spec-decode topk
  page_table buffers (lightseekorg#125)

Fork delta replayed on top of the new base: 82 files changed,
+22833 / -373.

Conflict resolution:

* ``python/tokenspeed/runtime/sampling/backends/flashinfer_full.py``
  imports — took upstream's wider import block from lightseekorg#105 (added
  ``top_k_top_p_renorm_torch`` and ``write_output_top_logprobs``);
  references on lines 333 and 471 require them.

Pre-rebase state preserved at branch ``codex/ds4-sm12x-poc-prerebase-
20260514`` for safety; previous round's backup
``codex/ds4-sm12x-poc-prerebase`` still tracks remote.

Fork-specific work carried forward in this squash:

* SM12x sparse-MLA + DSv4-Flash output projection + MXFP4 MoE kernels
  (``tokenspeed-kernel/python/tokenspeed_kernel/thirdparty/cuda/``).
* DSv4-Flash runtime model + attention backend + per-kind K-split
  sparse MLA + indexer ds4-decode shortcut.
* V2 Stage 1 attention aux-stream (post-projection overlap on SM12x).
* Bench tool ``await_with_per_request_timeout`` +
  ``sock_read=120`` / ``sock_connect=30`` + ``test_bench_timeout.py``.
* All experiment archives + failed-attempts log in ``docs/notes/``.

Test plan:

* AST-parse sanity on the 63 staged Python files: clean.
* Pre-commit + workstation rebuild + sanity test sweep to follow on a
  separate turn before pushing.

Signed-off-by: jasl <jasl9187@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants