Skip to content

ci(trtllm): install pre-release wheel from PyPI instead of building from source#1501

Open
key4ng wants to merge 2 commits into
mainfrom
keyang/trtllm-ci-prerelease-wheel
Open

ci(trtllm): install pre-release wheel from PyPI instead of building from source#1501
key4ng wants to merge 2 commits into
mainfrom
keyang/trtllm-ci-prerelease-wheel

Conversation

@key4ng
Copy link
Copy Markdown
Collaborator

@key4ng key4ng commented May 17, 2026

Description

Problem

The e2e-1gpu-chat (trtllm) CI step has been hanging for ~30 min on every PR because Setup TRT-LLM backend falls into the from-source build path. Root cause:

  • .github/actions/setup-trtllm/action.yml keys the wheel cache on hashFiles('scripts/ci_install_trtllm.sh').
  • GitHub Actions evicts caches after 7 days of no access. The repo currently has 0 caches matching trtllm-wheel-* (out of 2476 total).
  • On cache miss, the script clones TensorRT-LLM main, runs git lfs pull, and invokes scripts/build_wheel.py --cuda_architectures "90-real" — a full CMake C++ compile.
  • Even when a PR run rebuilds and saves the cache, that cache lives on the PR branch only. Actions cache scoping means subsequent PR branches can't read it unless main itself has a warm cache.

Example: https://github.com/lightseekorg/smg/actions/runs/25981764002/job/76372164127 stuck >34 min on "Setup TRT-LLM backend" while every other step was pending.

Solution

Install the published tensorrt-llm==1.3.0rc14 pre-release wheel from PyPI. Verification that the fixes which motivated the source build are present in rc14:

TRT-LLM PR Title In rc14
#11037 gRPC server for high-performance external router integration YES (since v1.3.0rc2)
#12045 Fix harmony parsers for agentic coding YES
#12467 Harmony Parser Delta Grouping + Reuse Report YES

Saves ~30 min/run and removes the brittle wheel-cache dependency entirely.

Changes

  • scripts/ci_install_trtllm.sh: drop the cached-wheel short-circuit and the from-source build path; replace with a single pip install --pre --extra-index-url https://download.pytorch.org/whl/cu130 tensorrt-llm==1.3.0rc14. Runtime apt deps (libnvinfer10, cuda-toolkit-13-0, libopenmpi-dev), NCCL pin, CUDA env setup, LD_LIBRARY_PATH propagation, and the gRPC serve smoke test are preserved.
  • .github/actions/setup-trtllm/action.yml: remove the now-redundant actions/cache/restore and actions/cache/save steps.

Net diff: +28 / −304.

Caveats

  • Two newer GPT-OSS router fixes — #13675 (2026-05-04) and #13798 (2026-05-10) — landed after rc14 was cut and are not in this wheel. If the e2e suite hits those code paths we'll see regressions; the next pre-release should pick them up.
  • This is a CI-only change — no runtime/library code in model_gateway or any crate is touched.

Test Plan

CI must run the trtllm e2e job on this PR. Specifically:

  • e2e-1gpu-chat (trtllm) / run passes.
  • Setup TRT-LLM backend completes in well under 30 min (target: a few minutes, dominated by apt + pip download).
  • python3 -m tensorrt_llm.commands.serve serve --help smoke test still succeeds (proves PR #11037 gRPC serve is present in the installed wheel).
  • Harmony / gpt-oss test classes enabled by test(e2e): enable Harmony tests on vLLM and TRT-LLM backends #801 still pass on the trtllm backend.
Checklist
  • cargo +nightly fmt passes (no Rust touched)
  • cargo clippy --all-targets --all-features -- -D warnings passes (no Rust touched)
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • Chores
    • CI setup simplified: TensorRT-LLM is now installed as a pre-release wheel from PyPI instead of being built or cached locally.
    • CI environment and install steps streamlined (venv creation, dependency installation, library path configuration) and CI verification messaging clarified to reflect the PyPI-based installation.

Review Change Stack

…rom source

PR #11037 (gRPC serve) and the Harmony parser fixes that motivated the
source build (#12045, #12467) are all included in tensorrt-llm 1.3.0rc14
on PyPI. Pinning to the published pre-release saves ~30 min of CMake
compile time per CI run and removes the brittle wheel-cache dance whose
key was evicted after 7 days of inactivity.

Drop the cached-wheel + source-build branches from ci_install_trtllm.sh
and the now-redundant cache restore/save steps from the composite
action. The runtime apt deps, NCCL pin, cu130 torch index URL, and
LD_LIBRARY_PATH setup are preserved.

Signed-off-by: key4ng <rukeyang@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b3d2347c-2ad3-41da-b301-fb6672a9fbed

📥 Commits

Reviewing files that changed from the base of the PR and between 665ad79 and 35a1b8e.

📒 Files selected for processing (1)
  • scripts/ci_install_trtllm.sh

📝 Walkthrough

Walkthrough

Composite GitHub Action and CI script changed to install a TensorRT-LLM pre-release wheel from PyPI (TRTLLM_VERSION=1.3.0rc14). Wheel-cache restore/save removed; script now sets CUDA/NCCL runtime, activates optional .venv, installs runtime dependencies and the PyPI pre-release wheel, and verifies gRPC serve help.

Changes

TRT-LLM PyPI Installation

Layer / File(s) Summary
GitHub Action definition update
.github/actions/setup-trtllm/action.yml
Composite action description updated from "restore/save TRT-LLM wheel cache" to "install TensorRT-LLM pre-release wheel from PyPI"; cache restore and save steps removed, leaving only the venv creation and install script invocation.
Installation script rewrite for PyPI wheels
scripts/ci_install_trtllm.sh
Script switched from source build to PyPI pre-release wheel installation. Fixed version TRTLLM_VERSION="1.3.0rc14" introduced; optional .venv activation added; runtime-only CUDA/TensorRT dependencies installed instead of full build toolchain; pip upgraded and NCCL runtime installed; TensorRT-LLM wheel fetched from PyPI using --pre flag and cu130 PyTorch extra index; gRPC serve verification updated to explicitly test serve --help; completion message changed to reference PyPI installation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • lightseekorg/smg#319: Introduced the setup-trtllm composite action with wheel caching around scripts/ci_install_trtllm.sh; this PR removes that caching and switches to PyPI wheels.
  • lightseekorg/smg#940: Also modifies scripts/ci_install_trtllm.sh to use cu130 PyTorch extra index and aligns NCCL constraints for wheel compatibility.
  • lightseekorg/smg#779: Prior changes to scripts/ci_install_trtllm.sh NCCL handling around wheel version matching.

Suggested reviewers

  • gongwei-130
  • slin1237
  • XinyueZhang369
  • CatherineSue

Poem

🐇 The wheels hop in from PyPI's field,
No source-build toil, no cache to yield,
1.3.0rc14 arrives on cue,
CUDA, NCCL set paths anew,
gRPC says "serve --help" — all checks revealed.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: replacing a source build with PyPI wheel installation for TRT-LLM in CI.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch keyang/trtllm-ci-prerelease-wheel

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the ci CI/CD configuration changes label May 17, 2026
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean CI simplification. Version is pinned, runtime deps preserved, --pre + --extra-index-url flags are correct, verification steps retained. LGTM.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/ci_install_trtllm.sh`:
- Line 88: The smoke-test command currently swallows failures by appending '||
echo "WARNING: serve --help failed"' to the pipeline; remove that fallback so a
failing `python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20`
returns a non-zero exit and fails CI (i.e., delete the `|| echo "WARNING: serve
--help failed"` part so the command fails hard on error).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52dcf520-0190-4606-a0e4-0411aeba7787

📥 Commits

Reviewing files that changed from the base of the PR and between 9557d14 and 665ad79.

📒 Files selected for processing (2)
  • .github/actions/setup-trtllm/action.yml
  • scripts/ci_install_trtllm.sh


# Smoke-test: verify the serve command can parse --help without crashing
echo "Verifying gRPC serve command..."
python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20 || echo "WARNING: serve --help failed"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the script currently masks verification failures with `|| echo`.
# Expected result: one match on the serve --help line containing `|| echo`.
rg -n --type=sh -C2 'serve --help.*\|\|.*WARNING' scripts/ci_install_trtllm.sh

Repository: lightseekorg/smg

Length of output: 374


🏁 Script executed:

#!/bin/bash
# Check if script has set -euo pipefail
head -20 scripts/ci_install_trtllm.sh | grep -n "set -"

Repository: lightseekorg/smg

Length of output: 81


Do not swallow smoke-test failures in CI verification.

Line 88 converts a failed serve --help check into a warning, allowing the job to pass despite a broken TRT-LLM install. Remove the || echo fallback to make this check fail hard.

Proposed fix
- python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20 || echo "WARNING: serve --help failed"
+ python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20 || echo "WARNING: serve --help failed"
python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/ci_install_trtllm.sh` at line 88, The smoke-test command currently
swallows failures by appending '|| echo "WARNING: serve --help failed"' to the
pipeline; remove that fallback so a failing `python3 -m
tensorrt_llm.commands.serve serve --help 2>&1 | head -20` returns a non-zero
exit and fails CI (i.e., delete the `|| echo "WARNING: serve --help failed"`
part so the command fails hard on error).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 665ad79649

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +62 to +64
pip install --no-cache-dir --pre \
--extra-index-url https://download.pytorch.org/whl/cu130 \
"tensorrt-llm==${TRTLLM_VERSION}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Install the actual TensorRT-LLM wheel

In the clean CI runners this command only searches PyPI plus the PyTorch CUDA index, but the PyPI metadata for tensorrt-llm==1.3.0rc14 currently exposes only the tiny source distribution tensorrt_llm-1.3.0rc14.tar.gz and no prebuilt wheel. That means this no longer installs the compiled TensorRT-LLM artifact the next verification imports (tensorrt_llm.commands.serve), so every setup-trtllm job will fail or install an unusable placeholder unless the NVIDIA wheel index/direct wheel is included.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the CI pipeline to install TensorRT-LLM directly from a PyPI pre-release wheel, significantly reducing build times by removing the need to compile from source and manage local wheel caches. The changes simplify scripts/ci_install_trtllm.sh by stripping out source cloning, CMake patching, and build-specific dependencies. Feedback was provided to improve the robustness of LD_LIBRARY_PATH exports by using conditional expansion to avoid leading or trailing colons, which can pose security risks by accidentally including the current working directory in the library search path.

source .venv/bin/activate
fi
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64:${LD_LIBRARY_PATH:-}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current construction of LD_LIBRARY_PATH can result in a trailing colon if the variable was previously empty or unset. In shell, a trailing or leading colon in LD_LIBRARY_PATH is interpreted as including the current working directory (.), which is generally considered a security risk and can lead to unexpected library resolution. It is safer to use the ${VAR:+:${VAR}} syntax to only append the colon if the variable is already non-empty.

Suggested change
export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64:${LD_LIBRARY_PATH:-}"
export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

# ── Add pip-installed NVIDIA libraries to LD_LIBRARY_PATH ────────────────────
NVIDIA_LIB_DIRS=$(find "$SITE_PACKAGES/nvidia" -name "lib" -type d 2>/dev/null | sort -u | paste -sd':')
if [ -n "$NVIDIA_LIB_DIRS" ]; then
export LD_LIBRARY_PATH="${NVIDIA_LIB_DIRS}:${LD_LIBRARY_PATH:-}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, this construction can introduce a leading or trailing colon if LD_LIBRARY_PATH was empty. Using the conditional expansion syntax ensures a clean path list without accidentally including the current directory in the search path.

Suggested change
export LD_LIBRARY_PATH="${NVIDIA_LIB_DIRS}:${LD_LIBRARY_PATH:-}"
export LD_LIBRARY_PATH="${NVIDIA_LIB_DIRS}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

PyPI only ships the tensorrt-llm source tarball, so a plain
pip install tensorrt-llm==1.3.0rc14 triggers a full source build —
defeating the purpose of switching off the build_wheel.py path.
The pre-built linux_x86_64 wheel (2.75 GB) lives at
https://pypi.nvidia.com/tensorrt-llm/, which pip needs as an
extra index to resolve.

Signed-off-by: key4ng <rukeyang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci CI/CD configuration changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant