Skip to content

[CI] Reuse build-and-test artifacts in vpto-sim-validation and remove duplicated CI work #818

Description

@Zhendong404

Summary

Summary

The current PR CI gate does duplicated work between build-and-test and vpto-sim-validation, which makes the overall gate significantly slower than necessary.

Today these two workflows both rebuild the same PTOAS toolchain stack and also duplicate part of the validation surface:

  • both prepare/restore/build LLVM
  • both build/install PTOAS
  • both run overlapping frontend/runtime smoke coverage around PTODSL / TileLang / end-to-end compilation flows
  • tilelang-dsl unit coverage is also duplicated

This issue proposes:

  1. reusing build-and-test build artifacts in vpto-sim-validation
  2. removing low-value duplicated checks
  3. keeping only the simulator-specific incremental coverage in vpto-sim-validation

Motivation / use case

Reproduction input

PR CI workflows:

  • .github/workflows/ci.yml
  • .github/workflows/ci_sim.yml

Relevant duplicated build sections:

build-and-test in ci.yml

  • resolves/restores/builds LLVM
  • builds PTOAS
  • runs PTODSL tests, lit, sample tests

vpto-sim-validation in ci_sim.yml

  • resolves/restores/builds LLVM again
  • builds PTOAS again
  • runs PyPTO smoke, VPTO SIM validation, TileLang ST, PTODSL DSL ST, tilelang-dsl unittest

Some concrete overlap points:

  1. LLVM + PTOAS are built twice
  • ci.yml build-and-test
  • ci_sim.yml vpto-sim-validation
  1. tilelang-dsl unit coverage is duplicated
  • ctest -L PTODSL in build-and-test
  • standalone tilelang-dsl unittest in vpto-sim-validation
  1. PTODSL / TileLang / end-to-end smoke coverage is partially overlapping
  • sample / frontend / compile-path validation in build-and-test
  • simulator-side end-to-end smoke in vpto-sim-validation

Expected performance

The PR gate should avoid rebuilding the same LLVM/PTOAS stack twice when the second workflow can consume artifacts from the first one.

Expected improvements:

  • vpto-sim-validation should reuse PTOAS artifacts from build-and-test
  • duplicated tilelang-dsl unit coverage should be removed from one side
  • vpto-sim-validation should retain only simulator-specific incremental validation
  • total PR CI wall time should drop noticeably, especially for PRs that trigger both workflows

Actual performance

At the moment:

  • build-and-test builds LLVM + PTOAS once
  • vpto-sim-validation builds LLVM + PTOAS again
  • some validation layers are repeated with limited incremental value
  • PRs that touch VPTO / PTODSL / TileLang related paths pay for two heavyweight pipelines

Proposed solution

1. Reuse build-and-test artifacts in vpto-sim-validation

Have build-and-test upload a CI artifact containing at least:

  • PTOAS install tree or executable payload
  • any runtime libraries needed by the self-hosted simulator jobs
  • optionally a minimal manifest with commit / build metadata

Candidate artifact content:

  • install-assert/
  • or build-assert/tools/ptoas/ptoas plus required shared libs / Python package files

Then make vpto-sim-validation download and use that artifact directly instead of rebuilding PTOAS.

This removes the second PTOAS build from ci_sim.yml.

2. Keep LLVM reuse practical, but optimize for PTOAS artifact reuse first

It is acceptable if vpto-sim-validation still relies on self-hosted LLVM cache or local runner tool cache, but it should not rebuild PTOAS when the exact same commit was already built in build-and-test.

Artifact reuse should be the first optimization target because it is the cleanest cross-workflow deduplication point.

3. Remove duplicated low-value checks

Suggested cleanup:

  • keep tilelang-dsl import/unittest coverage in build-and-test only
  • remove standalone duplicate tilelang-dsl unittest from vpto-sim-validation

This preserves basic Python-side coverage while avoiding duplicate PR cost.

4. Narrow vpto-sim-validation to simulator-specific incremental coverage

After artifact reuse, vpto-sim-validation should focus on checks that build-and-test cannot provide well, for example:

  • VPTO host simulator validation
  • PyPTO simulator smoke
  • PTODSL DSL ST through simulator / msprof path
  • TileLang ST only if it provides simulator-specific value beyond the main CI

In other words, vpto-sim-validation should be the “simulator extension layer”, not a second full build-and-test pipeline.

5. Optional follow-up: split ci_sim by sub-area

A later follow-up can refine path filters so that touching one area does not always trigger all of:

  • PyPTO smoke
  • VPTO SIM validation
  • TileLang ST
  • PTODSL DSL ST

This is useful, but artifact reuse and duplicate removal should come first.

Non-goals

This issue does not propose reusing build-and-test artifacts in build_wheel.

build_wheel has a different packaging/distribution purpose and should remain independently validated in its own environment.

Risks / things to validate

  • artifact layout must be stable enough for self-hosted simulator jobs
  • required shared libraries / Python paths must be included or reconstructed reliably
  • self-hosted runner environment may still need local simulator dependencies
  • if ABI / environment assumptions differ between GitHub-hosted and self-hosted runners, artifact packaging must make those differences explicit

Suggested implementation order

  1. Upload PTOAS artifact from build-and-test
  2. Make vpto-sim-validation consume that artifact instead of rebuilding PTOAS
  3. Remove duplicated tilelang-dsl unittest from ci_sim
  4. Re-evaluate whether TileLang ST / PTODSL DSL ST still have unnecessary overlap
  5. Optionally split ci_sim path filters into smaller subdomains

Git commit

Current repo HEAD when filing this issue:
<fill with git rev-parse HEAD>

Proposed API / behavior

No response

Alternatives considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions