Summary
Summary
The current PR CI gate does duplicated work between build-and-test and vpto-sim-validation, which makes the overall gate significantly slower than necessary.
Today these two workflows both rebuild the same PTOAS toolchain stack and also duplicate part of the validation surface:
- both prepare/restore/build LLVM
- both build/install PTOAS
- both run overlapping frontend/runtime smoke coverage around PTODSL / TileLang / end-to-end compilation flows
tilelang-dsl unit coverage is also duplicated
This issue proposes:
- reusing
build-and-test build artifacts in vpto-sim-validation
- removing low-value duplicated checks
- keeping only the simulator-specific incremental coverage in
vpto-sim-validation
Motivation / use case
Reproduction input
PR CI workflows:
.github/workflows/ci.yml
.github/workflows/ci_sim.yml
Relevant duplicated build sections:
build-and-test in ci.yml
- resolves/restores/builds LLVM
- builds PTOAS
- runs PTODSL tests, lit, sample tests
vpto-sim-validation in ci_sim.yml
- resolves/restores/builds LLVM again
- builds PTOAS again
- runs PyPTO smoke, VPTO SIM validation, TileLang ST, PTODSL DSL ST, tilelang-dsl unittest
Some concrete overlap points:
- LLVM + PTOAS are built twice
ci.yml build-and-test
ci_sim.yml vpto-sim-validation
tilelang-dsl unit coverage is duplicated
ctest -L PTODSL in build-and-test
- standalone
tilelang-dsl unittest in vpto-sim-validation
- PTODSL / TileLang / end-to-end smoke coverage is partially overlapping
- sample / frontend / compile-path validation in
build-and-test
- simulator-side end-to-end smoke in
vpto-sim-validation
Expected performance
The PR gate should avoid rebuilding the same LLVM/PTOAS stack twice when the second workflow can consume artifacts from the first one.
Expected improvements:
vpto-sim-validation should reuse PTOAS artifacts from build-and-test
- duplicated
tilelang-dsl unit coverage should be removed from one side
vpto-sim-validation should retain only simulator-specific incremental validation
- total PR CI wall time should drop noticeably, especially for PRs that trigger both workflows
Actual performance
At the moment:
build-and-test builds LLVM + PTOAS once
vpto-sim-validation builds LLVM + PTOAS again
- some validation layers are repeated with limited incremental value
- PRs that touch VPTO / PTODSL / TileLang related paths pay for two heavyweight pipelines
Proposed solution
1. Reuse build-and-test artifacts in vpto-sim-validation
Have build-and-test upload a CI artifact containing at least:
- PTOAS install tree or executable payload
- any runtime libraries needed by the self-hosted simulator jobs
- optionally a minimal manifest with commit / build metadata
Candidate artifact content:
install-assert/
- or
build-assert/tools/ptoas/ptoas plus required shared libs / Python package files
Then make vpto-sim-validation download and use that artifact directly instead of rebuilding PTOAS.
This removes the second PTOAS build from ci_sim.yml.
2. Keep LLVM reuse practical, but optimize for PTOAS artifact reuse first
It is acceptable if vpto-sim-validation still relies on self-hosted LLVM cache or local runner tool cache, but it should not rebuild PTOAS when the exact same commit was already built in build-and-test.
Artifact reuse should be the first optimization target because it is the cleanest cross-workflow deduplication point.
3. Remove duplicated low-value checks
Suggested cleanup:
- keep
tilelang-dsl import/unittest coverage in build-and-test only
- remove standalone duplicate
tilelang-dsl unittest from vpto-sim-validation
This preserves basic Python-side coverage while avoiding duplicate PR cost.
4. Narrow vpto-sim-validation to simulator-specific incremental coverage
After artifact reuse, vpto-sim-validation should focus on checks that build-and-test cannot provide well, for example:
- VPTO host simulator validation
- PyPTO simulator smoke
- PTODSL DSL ST through simulator / msprof path
- TileLang ST only if it provides simulator-specific value beyond the main CI
In other words, vpto-sim-validation should be the “simulator extension layer”, not a second full build-and-test pipeline.
5. Optional follow-up: split ci_sim by sub-area
A later follow-up can refine path filters so that touching one area does not always trigger all of:
- PyPTO smoke
- VPTO SIM validation
- TileLang ST
- PTODSL DSL ST
This is useful, but artifact reuse and duplicate removal should come first.
Non-goals
This issue does not propose reusing build-and-test artifacts in build_wheel.
build_wheel has a different packaging/distribution purpose and should remain independently validated in its own environment.
Risks / things to validate
- artifact layout must be stable enough for self-hosted simulator jobs
- required shared libraries / Python paths must be included or reconstructed reliably
- self-hosted runner environment may still need local simulator dependencies
- if ABI / environment assumptions differ between GitHub-hosted and self-hosted runners, artifact packaging must make those differences explicit
Suggested implementation order
- Upload PTOAS artifact from
build-and-test
- Make
vpto-sim-validation consume that artifact instead of rebuilding PTOAS
- Remove duplicated
tilelang-dsl unittest from ci_sim
- Re-evaluate whether TileLang ST / PTODSL DSL ST still have unnecessary overlap
- Optionally split
ci_sim path filters into smaller subdomains
Git commit
Current repo HEAD when filing this issue:
<fill with git rev-parse HEAD>
Proposed API / behavior
No response
Alternatives considered
No response
Additional context
No response
Summary
Summary
The current PR CI gate does duplicated work between
build-and-testandvpto-sim-validation, which makes the overall gate significantly slower than necessary.Today these two workflows both rebuild the same PTOAS toolchain stack and also duplicate part of the validation surface:
tilelang-dslunit coverage is also duplicatedThis issue proposes:
build-and-testbuild artifacts invpto-sim-validationvpto-sim-validationMotivation / use case
Reproduction input
PR CI workflows:
.github/workflows/ci.yml.github/workflows/ci_sim.ymlRelevant duplicated build sections:
build-and-testinci.ymlvpto-sim-validationinci_sim.ymlSome concrete overlap points:
ci.ymlbuild-and-testci_sim.ymlvpto-sim-validationtilelang-dslunit coverage is duplicatedctest -L PTODSLinbuild-and-testtilelang-dslunittest invpto-sim-validationbuild-and-testvpto-sim-validationExpected performance
The PR gate should avoid rebuilding the same LLVM/PTOAS stack twice when the second workflow can consume artifacts from the first one.
Expected improvements:
vpto-sim-validationshould reuse PTOAS artifacts frombuild-and-testtilelang-dslunit coverage should be removed from one sidevpto-sim-validationshould retain only simulator-specific incremental validationActual performance
At the moment:
build-and-testbuilds LLVM + PTOAS oncevpto-sim-validationbuilds LLVM + PTOAS againProposed solution
1. Reuse
build-and-testartifacts invpto-sim-validationHave
build-and-testupload a CI artifact containing at least:Candidate artifact content:
install-assert/build-assert/tools/ptoas/ptoasplus required shared libs / Python package filesThen make
vpto-sim-validationdownload and use that artifact directly instead of rebuilding PTOAS.This removes the second PTOAS build from
ci_sim.yml.2. Keep LLVM reuse practical, but optimize for PTOAS artifact reuse first
It is acceptable if
vpto-sim-validationstill relies on self-hosted LLVM cache or local runner tool cache, but it should not rebuild PTOAS when the exact same commit was already built inbuild-and-test.Artifact reuse should be the first optimization target because it is the cleanest cross-workflow deduplication point.
3. Remove duplicated low-value checks
Suggested cleanup:
tilelang-dslimport/unittest coverage inbuild-and-testonlytilelang-dslunittest fromvpto-sim-validationThis preserves basic Python-side coverage while avoiding duplicate PR cost.
4. Narrow
vpto-sim-validationto simulator-specific incremental coverageAfter artifact reuse,
vpto-sim-validationshould focus on checks thatbuild-and-testcannot provide well, for example:In other words,
vpto-sim-validationshould be the “simulator extension layer”, not a second full build-and-test pipeline.5. Optional follow-up: split
ci_simby sub-areaA later follow-up can refine path filters so that touching one area does not always trigger all of:
This is useful, but artifact reuse and duplicate removal should come first.
Non-goals
This issue does not propose reusing
build-and-testartifacts inbuild_wheel.build_wheelhas a different packaging/distribution purpose and should remain independently validated in its own environment.Risks / things to validate
Suggested implementation order
build-and-testvpto-sim-validationconsume that artifact instead of rebuilding PTOAStilelang-dslunittest fromci_simci_simpath filters into smaller subdomainsGit commit
Current repo HEAD when filing this issue:
<fill with git rev-parse HEAD>Proposed API / behavior
No response
Alternatives considered
No response
Additional context
No response