ci(trtllm): install pre-release wheel from PyPI instead of building from source by key4ng · Pull Request #1501 · lightseekorg/smg

key4ng · 2026-05-17T06:35:20Z

Description

Problem

The e2e-1gpu-chat (trtllm) CI step has been hanging for ~30 min on every PR because Setup TRT-LLM backend falls into the from-source build path. Root cause:

.github/actions/setup-trtllm/action.yml keys the wheel cache on hashFiles('scripts/ci_install_trtllm.sh').
GitHub Actions evicts caches after 7 days of no access. The repo currently has 0 caches matching trtllm-wheel-* (out of 2476 total).
On cache miss, the script clones TensorRT-LLM main, runs git lfs pull, and invokes scripts/build_wheel.py --cuda_architectures "90-real" — a full CMake C++ compile.
Even when a PR run rebuilds and saves the cache, that cache lives on the PR branch only. Actions cache scoping means subsequent PR branches can't read it unless main itself has a warm cache.

Example: https://github.com/lightseekorg/smg/actions/runs/25981764002/job/76372164127 stuck >34 min on "Setup TRT-LLM backend" while every other step was pending.

Solution

Install the published tensorrt-llm==1.3.0rc14 pre-release wheel from PyPI. Verification that the fixes which motivated the source build are present in rc14:

TRT-LLM PR	Title	In rc14
#11037	gRPC server for high-performance external router integration	YES (since v1.3.0rc2)
#12045	Fix harmony parsers for agentic coding	YES
#12467	Harmony Parser Delta Grouping + Reuse Report	YES

Saves ~30 min/run and removes the brittle wheel-cache dependency entirely.

Changes

scripts/ci_install_trtllm.sh: drop the cached-wheel short-circuit and the from-source build path; replace with a single pip install --pre --extra-index-url https://download.pytorch.org/whl/cu130 tensorrt-llm==1.3.0rc14. Runtime apt deps (libnvinfer10, cuda-toolkit-13-0, libopenmpi-dev), NCCL pin, CUDA env setup, LD_LIBRARY_PATH propagation, and the gRPC serve smoke test are preserved.
.github/actions/setup-trtllm/action.yml: remove the now-redundant actions/cache/restore and actions/cache/save steps.

Net diff: +28 / −304.

Caveats

Two newer GPT-OSS router fixes — #13675 (2026-05-04) and #13798 (2026-05-10) — landed after rc14 was cut and are not in this wheel. If the e2e suite hits those code paths we'll see regressions; the next pre-release should pick them up.
This is a CI-only change — no runtime/library code in model_gateway or any crate is touched.

Test Plan

CI must run the trtllm e2e job on this PR. Specifically:

e2e-1gpu-chat (trtllm) / run passes.
Setup TRT-LLM backend completes in well under 30 min (target: a few minutes, dominated by apt + pip download).
python3 -m tensorrt_llm.commands.serve serve --help smoke test still succeeds (proves PR #11037 gRPC serve is present in the installed wheel).
Harmony / gpt-oss test classes enabled by test(e2e): enable Harmony tests on vLLM and TRT-LLM backends #801 still pass on the trtllm backend.

Checklist

cargo +nightly fmt passes (no Rust touched)
cargo clippy --all-targets --all-features -- -D warnings passes (no Rust touched)
(Optional) Documentation updated
(Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

Chores
- CI setup simplified: TensorRT-LLM is now installed as a pre-release wheel from PyPI instead of being built or cached locally.
- CI environment and install steps streamlined (venv creation, dependency installation, library path configuration) and CI verification messaging clarified to reflect the PyPI-based installation.

…rom source PR #11037 (gRPC serve) and the Harmony parser fixes that motivated the source build (#12045, #12467) are all included in tensorrt-llm 1.3.0rc14 on PyPI. Pinning to the published pre-release saves ~30 min of CMake compile time per CI run and removes the brittle wheel-cache dance whose key was evicted after 7 days of inactivity. Drop the cached-wheel + source-build branches from ci_install_trtllm.sh and the now-redundant cache restore/save steps from the composite action. The runtime apt deps, NCCL pin, cu130 torch index URL, and LD_LIBRARY_PATH setup are preserved. Signed-off-by: key4ng <rukeyang@gmail.com>

coderabbitai · 2026-05-17T06:35:29Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b3d2347c-2ad3-41da-b301-fb6672a9fbed

📥 Commits

Reviewing files that changed from the base of the PR and between 665ad79 and 35a1b8e.

📒 Files selected for processing (1)

scripts/ci_install_trtllm.sh

📝 Walkthrough

Walkthrough

Composite GitHub Action and CI script changed to install a TensorRT-LLM pre-release wheel from PyPI (TRTLLM_VERSION=1.3.0rc14). Wheel-cache restore/save removed; script now sets CUDA/NCCL runtime, activates optional .venv, installs runtime dependencies and the PyPI pre-release wheel, and verifies gRPC serve help.

Changes

TRT-LLM PyPI Installation

Layer / File(s)	Summary
GitHub Action definition update `.github/actions/setup-trtllm/action.yml`	Composite action description updated from "restore/save TRT-LLM wheel cache" to "install TensorRT-LLM pre-release wheel from PyPI"; cache restore and save steps removed, leaving only the venv creation and install script invocation.
Installation script rewrite for PyPI wheels `scripts/ci_install_trtllm.sh`	Script switched from source build to PyPI pre-release wheel installation. Fixed version `TRTLLM_VERSION="1.3.0rc14"` introduced; optional `.venv` activation added; runtime-only CUDA/TensorRT dependencies installed instead of full build toolchain; pip upgraded and NCCL runtime installed; TensorRT-LLM wheel fetched from PyPI using `--pre` flag and `cu130` PyTorch extra index; gRPC serve verification updated to explicitly test `serve --help`; completion message changed to reference PyPI installation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

lightseekorg/smg#319: Introduced the setup-trtllm composite action with wheel caching around scripts/ci_install_trtllm.sh; this PR removes that caching and switches to PyPI wheels.
lightseekorg/smg#940: Also modifies scripts/ci_install_trtllm.sh to use cu130 PyTorch extra index and aligns NCCL constraints for wheel compatibility.
lightseekorg/smg#779: Prior changes to scripts/ci_install_trtllm.sh NCCL handling around wheel version matching.

Suggested reviewers

gongwei-130
slin1237
XinyueZhang369
CatherineSue

Poem

🐇 The wheels hop in from PyPI's field,
No source-build toil, no cache to yield,
1.3.0rc14 arrives on cue,
CUDA, NCCL set paths anew,
gRPC says "serve --help" — all checks revealed.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately summarizes the main change: replacing a source build with PyPI wheel installation for TRT-LLM in CI.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch keyang/trtllm-ci-prerelease-wheel

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude

Clean CI simplification. Version is pinned, runtime deps preserved, --pre + --extra-index-url flags are correct, verification steps retained. LGTM.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/ci_install_trtllm.sh`:
- Line 88: The smoke-test command currently swallows failures by appending '||
echo "WARNING: serve --help failed"' to the pipeline; remove that fallback so a
failing `python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20`
returns a non-zero exit and fails CI (i.e., delete the `|| echo "WARNING: serve
--help failed"` part so the command fails hard on error).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52dcf520-0190-4606-a0e4-0411aeba7787

📥 Commits

Reviewing files that changed from the base of the PR and between 9557d14 and 665ad79.

📒 Files selected for processing (2)

.github/actions/setup-trtllm/action.yml
scripts/ci_install_trtllm.sh

coderabbitai · 2026-05-17T06:37:46Z

-
-# Smoke-test: verify the serve command can parse --help without crashing
 echo "Verifying gRPC serve command..."
 python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20 || echo "WARNING: serve --help failed"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the script currently masks verification failures with `|| echo`. # Expected result: one match on the serve --help line containing `|| echo`. rg -n --type=sh -C2 'serve --help.*\|\|.*WARNING' scripts/ci_install_trtllm.sh

Repository: lightseekorg/smg

Length of output: 374

🏁 Script executed:

#!/bin/bash # Check if script has set -euo pipefail head -20 scripts/ci_install_trtllm.sh | grep -n "set -"

Repository: lightseekorg/smg

Length of output: 81

Do not swallow smoke-test failures in CI verification.

Line 88 converts a failed serve --help check into a warning, allowing the job to pass despite a broken TRT-LLM install. Remove the || echo fallback to make this check fail hard.

Proposed fix

- python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20 || echo "WARNING: serve --help failed" + python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20 || echo "WARNING: serve --help failed"

python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/ci_install_trtllm.sh` at line 88, The smoke-test command currently swallows failures by appending '|| echo "WARNING: serve --help failed"' to the pipeline; remove that fallback so a failing `python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20` returns a non-zero exit and fails CI (i.e., delete the `|| echo "WARNING: serve --help failed"` part so the command fails hard on error).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 665ad79649

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-17T06:37:50Z

+pip install --no-cache-dir --pre \
+    --extra-index-url https://download.pytorch.org/whl/cu130 \
+    "tensorrt-llm==${TRTLLM_VERSION}"


Install the actual TensorRT-LLM wheel

In the clean CI runners this command only searches PyPI plus the PyTorch CUDA index, but the PyPI metadata for tensorrt-llm==1.3.0rc14 currently exposes only the tiny source distribution tensorrt_llm-1.3.0rc14.tar.gz and no prebuilt wheel. That means this no longer installs the compiled TensorRT-LLM artifact the next verification imports (tensorrt_llm.commands.serve), so every setup-trtllm job will fail or install an unusable placeholder unless the NVIDIA wheel index/direct wheel is included.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request refactors the CI pipeline to install TensorRT-LLM directly from a PyPI pre-release wheel, significantly reducing build times by removing the need to compile from source and manage local wheel caches. The changes simplify scripts/ci_install_trtllm.sh by stripping out source cloning, CMake patching, and build-specific dependencies. Feedback was provided to improve the robustness of LD_LIBRARY_PATH exports by using conditional expansion to avoid leading or trailing colons, which can pose security risks by accidentally including the current working directory in the library search path.

gemini-code-assist · 2026-05-17T06:38:12Z

-    source .venv/bin/activate
-fi
 export PATH="$CUDA_HOME/bin:$PATH"
 export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64:${LD_LIBRARY_PATH:-}"


The current construction of LD_LIBRARY_PATH can result in a trailing colon if the variable was previously empty or unset. In shell, a trailing or leading colon in LD_LIBRARY_PATH is interpreted as including the current working directory (.), which is generally considered a security risk and can lead to unexpected library resolution. It is safer to use the ${VAR:+:${VAR}} syntax to only append the colon if the variable is already non-empty.

Suggested change

export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64:${LD_LIBRARY_PATH:-}"

export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

gemini-code-assist · 2026-05-17T06:38:12Z

-# ── Add pip-installed NVIDIA libraries to LD_LIBRARY_PATH ────────────────────
 NVIDIA_LIB_DIRS=$(find "$SITE_PACKAGES/nvidia" -name "lib" -type d 2>/dev/null | sort -u | paste -sd':')
 if [ -n "$NVIDIA_LIB_DIRS" ]; then
    export LD_LIBRARY_PATH="${NVIDIA_LIB_DIRS}:${LD_LIBRARY_PATH:-}"


Similar to the previous comment, this construction can introduce a leading or trailing colon if LD_LIBRARY_PATH was empty. Using the conditional expansion syntax ensures a clean path list without accidentally including the current directory in the search path.

Suggested change

export LD_LIBRARY_PATH="${NVIDIA_LIB_DIRS}:${LD_LIBRARY_PATH:-}"

export LD_LIBRARY_PATH="${NVIDIA_LIB_DIRS}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

PyPI only ships the tensorrt-llm source tarball, so a plain pip install tensorrt-llm==1.3.0rc14 triggers a full source build — defeating the purpose of switching off the build_wheel.py path. The pre-built linux_x86_64 wheel (2.75 GB) lives at https://pypi.nvidia.com/tensorrt-llm/, which pip needs as an extra index to resolve. Signed-off-by: key4ng <rukeyang@gmail.com>

key4ng requested review from CatherineSue, XinyueZhang369, gongwei-130 and slin1237 as code owners May 17, 2026 06:35

github-actions Bot added the ci CI/CD configuration changes label May 17, 2026

claude Bot approved these changes May 17, 2026

View reviewed changes

coderabbitai Bot requested changes May 17, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(trtllm): install pre-release wheel from PyPI instead of building from source#1501

ci(trtllm): install pre-release wheel from PyPI instead of building from source#1501
key4ng wants to merge 2 commits into
mainfrom
keyang/trtllm-ci-prerelease-wheel

key4ng commented May 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

claude Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	python3 -m tensorrt_llm.commands.serve serve --help 2>&1 \| head -20 \|\| echo "WARNING: serve --help failed"
	python3 -m tensorrt_llm.commands.serve serve --help 2>&1 \| head -20

	export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64:${LD_LIBRARY_PATH:-}"
	export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

	export LD_LIBRARY_PATH="${NVIDIA_LIB_DIRS}:${LD_LIBRARY_PATH:-}"
	export LD_LIBRARY_PATH="${NVIDIA_LIB_DIRS}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

Conversation

key4ng commented May 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Caveats

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

key4ng commented May 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading