ci(trtllm): install pre-release wheel from PyPI instead of building from source#1501
ci(trtllm): install pre-release wheel from PyPI instead of building from source#1501key4ng wants to merge 2 commits into
Conversation
…rom source PR #11037 (gRPC serve) and the Harmony parser fixes that motivated the source build (#12045, #12467) are all included in tensorrt-llm 1.3.0rc14 on PyPI. Pinning to the published pre-release saves ~30 min of CMake compile time per CI run and removes the brittle wheel-cache dance whose key was evicted after 7 days of inactivity. Drop the cached-wheel + source-build branches from ci_install_trtllm.sh and the now-redundant cache restore/save steps from the composite action. The runtime apt deps, NCCL pin, cu130 torch index URL, and LD_LIBRARY_PATH setup are preserved. Signed-off-by: key4ng <rukeyang@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughComposite GitHub Action and CI script changed to install a TensorRT-LLM pre-release wheel from PyPI (TRTLLM_VERSION=1.3.0rc14). Wheel-cache restore/save removed; script now sets CUDA/NCCL runtime, activates optional ChangesTRT-LLM PyPI Installation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/ci_install_trtllm.sh`:
- Line 88: The smoke-test command currently swallows failures by appending '||
echo "WARNING: serve --help failed"' to the pipeline; remove that fallback so a
failing `python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20`
returns a non-zero exit and fails CI (i.e., delete the `|| echo "WARNING: serve
--help failed"` part so the command fails hard on error).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 52dcf520-0190-4606-a0e4-0411aeba7787
📒 Files selected for processing (2)
.github/actions/setup-trtllm/action.ymlscripts/ci_install_trtllm.sh
|
|
||
| # Smoke-test: verify the serve command can parse --help without crashing | ||
| echo "Verifying gRPC serve command..." | ||
| python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20 || echo "WARNING: serve --help failed" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the script currently masks verification failures with `|| echo`.
# Expected result: one match on the serve --help line containing `|| echo`.
rg -n --type=sh -C2 'serve --help.*\|\|.*WARNING' scripts/ci_install_trtllm.shRepository: lightseekorg/smg
Length of output: 374
🏁 Script executed:
#!/bin/bash
# Check if script has set -euo pipefail
head -20 scripts/ci_install_trtllm.sh | grep -n "set -"Repository: lightseekorg/smg
Length of output: 81
Do not swallow smoke-test failures in CI verification.
Line 88 converts a failed serve --help check into a warning, allowing the job to pass despite a broken TRT-LLM install. Remove the || echo fallback to make this check fail hard.
Proposed fix
- python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20 || echo "WARNING: serve --help failed"
+ python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20 || echo "WARNING: serve --help failed" | |
| python3 -m tensorrt_llm.commands.serve serve --help 2>&1 | head -20 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/ci_install_trtllm.sh` at line 88, The smoke-test command currently
swallows failures by appending '|| echo "WARNING: serve --help failed"' to the
pipeline; remove that fallback so a failing `python3 -m
tensorrt_llm.commands.serve serve --help 2>&1 | head -20` returns a non-zero
exit and fails CI (i.e., delete the `|| echo "WARNING: serve --help failed"`
part so the command fails hard on error).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 665ad79649
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pip install --no-cache-dir --pre \ | ||
| --extra-index-url https://download.pytorch.org/whl/cu130 \ | ||
| "tensorrt-llm==${TRTLLM_VERSION}" |
There was a problem hiding this comment.
Install the actual TensorRT-LLM wheel
In the clean CI runners this command only searches PyPI plus the PyTorch CUDA index, but the PyPI metadata for tensorrt-llm==1.3.0rc14 currently exposes only the tiny source distribution tensorrt_llm-1.3.0rc14.tar.gz and no prebuilt wheel. That means this no longer installs the compiled TensorRT-LLM artifact the next verification imports (tensorrt_llm.commands.serve), so every setup-trtllm job will fail or install an unusable placeholder unless the NVIDIA wheel index/direct wheel is included.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request refactors the CI pipeline to install TensorRT-LLM directly from a PyPI pre-release wheel, significantly reducing build times by removing the need to compile from source and manage local wheel caches. The changes simplify scripts/ci_install_trtllm.sh by stripping out source cloning, CMake patching, and build-specific dependencies. Feedback was provided to improve the robustness of LD_LIBRARY_PATH exports by using conditional expansion to avoid leading or trailing colons, which can pose security risks by accidentally including the current working directory in the library search path.
| source .venv/bin/activate | ||
| fi | ||
| export PATH="$CUDA_HOME/bin:$PATH" | ||
| export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64:${LD_LIBRARY_PATH:-}" |
There was a problem hiding this comment.
The current construction of LD_LIBRARY_PATH can result in a trailing colon if the variable was previously empty or unset. In shell, a trailing or leading colon in LD_LIBRARY_PATH is interpreted as including the current working directory (.), which is generally considered a security risk and can lead to unexpected library resolution. It is safer to use the ${VAR:+:${VAR}} syntax to only append the colon if the variable is already non-empty.
| export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64:${LD_LIBRARY_PATH:-}" | |
| export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" |
| # ── Add pip-installed NVIDIA libraries to LD_LIBRARY_PATH ──────────────────── | ||
| NVIDIA_LIB_DIRS=$(find "$SITE_PACKAGES/nvidia" -name "lib" -type d 2>/dev/null | sort -u | paste -sd':') | ||
| if [ -n "$NVIDIA_LIB_DIRS" ]; then | ||
| export LD_LIBRARY_PATH="${NVIDIA_LIB_DIRS}:${LD_LIBRARY_PATH:-}" |
There was a problem hiding this comment.
Similar to the previous comment, this construction can introduce a leading or trailing colon if LD_LIBRARY_PATH was empty. Using the conditional expansion syntax ensures a clean path list without accidentally including the current directory in the search path.
| export LD_LIBRARY_PATH="${NVIDIA_LIB_DIRS}:${LD_LIBRARY_PATH:-}" | |
| export LD_LIBRARY_PATH="${NVIDIA_LIB_DIRS}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" |
PyPI only ships the tensorrt-llm source tarball, so a plain pip install tensorrt-llm==1.3.0rc14 triggers a full source build — defeating the purpose of switching off the build_wheel.py path. The pre-built linux_x86_64 wheel (2.75 GB) lives at https://pypi.nvidia.com/tensorrt-llm/, which pip needs as an extra index to resolve. Signed-off-by: key4ng <rukeyang@gmail.com>
Description
Problem
The
e2e-1gpu-chat (trtllm)CI step has been hanging for ~30 min on every PR because Setup TRT-LLM backend falls into the from-source build path. Root cause:.github/actions/setup-trtllm/action.ymlkeys the wheel cache onhashFiles('scripts/ci_install_trtllm.sh').trtllm-wheel-*(out of 2476 total).main, runsgit lfs pull, and invokesscripts/build_wheel.py --cuda_architectures "90-real"— a full CMake C++ compile.mainitself has a warm cache.Example: https://github.com/lightseekorg/smg/actions/runs/25981764002/job/76372164127 stuck >34 min on "Setup TRT-LLM backend" while every other step was pending.
Solution
Install the published
tensorrt-llm==1.3.0rc14pre-release wheel from PyPI. Verification that the fixes which motivated the source build are present in rc14:Saves ~30 min/run and removes the brittle wheel-cache dependency entirely.
Changes
scripts/ci_install_trtllm.sh: drop the cached-wheel short-circuit and the from-source build path; replace with a singlepip install --pre --extra-index-url https://download.pytorch.org/whl/cu130 tensorrt-llm==1.3.0rc14. Runtime apt deps (libnvinfer10, cuda-toolkit-13-0, libopenmpi-dev), NCCL pin, CUDA env setup,LD_LIBRARY_PATHpropagation, and the gRPCservesmoke test are preserved..github/actions/setup-trtllm/action.yml: remove the now-redundantactions/cache/restoreandactions/cache/savesteps.Net diff: +28 / −304.
Caveats
model_gatewayor any crate is touched.Test Plan
CI must run the trtllm e2e job on this PR. Specifically:
e2e-1gpu-chat (trtllm) / runpasses.Setup TRT-LLM backendcompletes in well under 30 min (target: a few minutes, dominated by apt + pip download).python3 -m tensorrt_llm.commands.serve serve --helpsmoke test still succeeds (proves PR #11037 gRPC serve is present in the installed wheel).Checklist
cargo +nightly fmtpasses (no Rust touched)cargo clippy --all-targets --all-features -- -D warningspasses (no Rust touched)Summary by CodeRabbit