Skip to content

[ci] refactor: consolidate workflows and reorganize tests#687

Open
FoolPlayer wants to merge 16 commits intomainfrom
fp/refactor_tests_and_ci
Open

[ci] refactor: consolidate workflows and reorganize tests#687
FoolPlayer wants to merge 16 commits intomainfrom
fp/refactor_tests_and_ci

Conversation

@FoolPlayer
Copy link
Copy Markdown
Collaborator

What does this PR do?

Cleans up the CI workflow layer and the tests/ tree so that both are
easier to reason about and extend. No runtime / training code behavior
changes. Grouped into seven logical commits so each can be reviewed
independently.

Motivation: the existing setup had three overlapping problems:

  1. Each of the four top-level test workflows (gpu_unit_tests,
    npu_unit_tests, gpu_e2e_test, npu_e2e_test) hard-coded its own
    container, dependency sync, and pytest path list; a small change
    required editing all four in lock-step.
  2. Test selection was driven by hard-coded file paths in the workflows,
    so any file rename or test split needed matching CI edits.
  3. tests/e2e/test_e2e_parallel.py had grown to 417 lines mixing the
    harness with four modalities' parametrize lists, and two orphan e2e
    training tests with broken imports were still checked in.

Checklist Before Starting

  • No linked issue — this is internal CI hygiene.
  • PR title follows [{modules}] {type}: {description} format.

Test

Local verification on fp/refactor_tests_and_ci:

  • make quality clean (ruff check + ruff format --check, 428 files).
  • pytest --collect-only tests/ collects 491 tests (same as
    pre-refactor).
  • pytest --collect-only -m e2e tests/e2e collects 22 cases
    (same as the old monolithic file).
  • pytest --collect-only -m "e2e and dit" tests/e2e collects exactly
    the single Wan DiT case, which is what the new e2e GPU workflow's
    --extra dit step relies on.
  • The full GPU unit / e2e suites will run on this PR via the refactored
    workflows themselves — that is the end-to-end validation.

API and Usage Example

N/A — no public API, CLI, or config-field changes. Two internal paths
changed for contributors:

  1. Fixture configs moved:
    tests/toy_config/<m>_toy/tests/fixtures/toy_config/<m>_toy/
    tests/testdata/tests/fixtures/testdata/
  2. The single tests/e2e/test_e2e_parallel.py is now four per-modality
    files. When onboarding a new model, add the pytest.param(...) to
    the matching test_e2e_parallel_{text,vlm,omni,dit}.py. See
    updated docs/testing.md and docs/transformers_v5/testing_new_model.md.

Design & Code Changes

Commit-by-commit (newest → oldest, each is its own logical unit):

1. [test] refactor: consolidate tests/toy_config + tests/testdata under tests/fixtures/

  • tests/toy_config/tests/fixtures/toy_config/ (17 *_toy dirs).
  • tests/testdata/tests/fixtures/testdata/ (1 sample image).
  • Updated ~94 references across 22 files (tests/, veomni/data/dummy_dataset.py,
    docs/, .agents/, .gitignore).

2. [test] refactor: split tests/e2e/test_e2e_parallel.py by modality

  • New tests/e2e/_harness.py owns main(), _materialize_weights_dir(),
    DEFAULT_RTOL/ATOL, and the v4_only/v5_only/dit_only marks.
  • Four per-modality test files replace the monolith:
    test_e2e_parallel_text.py, _vlm.py, _omni.py, _dit.py.
  • The DiT file carries pytestmark = [e2e, dit] so the GPU e2e runner
    can pick it up via -m "e2e and dit" in a dedicated --extra dit step.
  • docs/, .agents/skills/veomni-migrate-transformers-v5/SKILL.md, and
    docs/transformers_v5/testing_new_model.md updated to reflect the
    new layout and import names.

3. [ci] refactor: consolidate static checks + move device-api script

  • Merged three single-purpose workflows (check_doc_task_paths.yml,
    device_api_check.yml, check_patchgen.yml) into a single
    static_checks.yml with parallel jobs.
  • The patchgen job is now path-gated: it only runs when files under
    veomni/models/transformers/** change (was always-on before).
  • Moved tests/special_sanity/check_device_api_usage.pyscripts/ci/check_device_api_usage.py
    (it is a standalone script, not a pytest test). Removed the empty
    tests/special_sanity/ directory.

4. [ci] refactor: convert 4 top-level test workflows to thin runner callers

  • gpu_unit_tests.yml, npu_unit_tests.yml, gpu_e2e_test.yml,
    npu_e2e_test.yml are now each ~40 lines: just declare which marker
    to run and which transformers variant, then uses: the reusable
    runner. Container image, volumes, NFS mounts, dep-sync logic live in
    one place.

5. [ci] feat: add reusable _test_runner_{gpu,npu}.yml workflows

  • Two new reusable workflows (workflow_call) own the full GPU/NPU
    test execution pipeline.
  • Inputs: marker (e.g. unit, e2e), transformers_variant
    (stable / v5 / both), run_dit_step (GPU only).
  • transformers_variant=both runs the same selection twice — first
    under the transformers-stable group, then after re-syncing with
    --extra transformers5-exp. When v4 is fully deprecated, callers
    switch to v5 and the v4 branch becomes dead code that can be
    dropped in one commit.
  • run_dit_step adds a third step that syncs --extra dit and runs
    pytest -m "<marker> and dit" — only the GPU e2e caller uses it.

6. [test] feat: add pytest markers + centralized conftest for tests/

  • Registered new markers in pyproject.toml: unit, e2e,
    gpu_only, npu_only, v5_only, dit, benchmark.
  • New tests/conftest.py:
    • session-scoped fixtures ci_models_dir, ci_samples_dir,
      ci_dataset_dir sourced from env vars the reusable workflows set.
    • pytest_collection_modifyitems hook applies device-appropriate
      skips (gpu_only / npu_only based on VEOMNI_TEST_DEVICE) and
      transformers-version skips (v5_only based on
      is_transformers_version_greater_or_equal_to("5.0.0")).
  • Annotated every tests/**/*.py with pytestmark = [pytest.mark.<unit|e2e>, ...]
    so CI can select by marker instead of file path.
  • Added tests/e2e/conftest.py consolidating the six
    dummy_*_dataset session fixtures previously duplicated inside the
    monolithic e2e file.

7. [ci, test] fix: repair broken CI steps and drop orphan e2e training tests

  • gpu_unit_tests.yml: fixed a prior commit that had truncated the
    data-step command to uv run --froze (missing the rest of the line).
  • npu_unit_tests.yml: removed an if: steps.changes.outputs.qwen3_5_ulysses == 'true'
    that referenced a step which no longer exists, silently skipping the test.
  • Deleted tests/e2e/test_e2e_training.py, tests/e2e/test_e2e_training_no_reshard.py,
    tests/e2e/exec_scripts.py — import-broken, never invoked by CI, and
    superseded by tests/train_scripts/. All doc references updated.

Checklist Before Submitting

  • Read the Contribute Guide
  • Applied pre-commit checks (make quality clean)
  • Added/updated documentation (docs/testing.md, docs/transformers_v5/*.md, docs/usage/support_new_models/*.md, .agents/knowledge/architecture.md, .agents/skills/*/SKILL.md)
  • No tasks/ training scripts moved or renamed in this PR
  • Added tests to CI workflow — existing tests re-annotated with markers; CI selection is now by marker, so any new test picked up by -m unit / -m e2e automatically runs.

FoolPlayer and others added 15 commits April 23, 2026 20:36
…ests

- gpu_unit_tests.yml: restore `pytest -s -x tests/data` (prior commit left
  the step as `uv run --froze`, silently disabling data tests on GPU)
- npu_unit_tests.yml: drop `if: steps.changes.outputs.qwen3_5_ulysses` —
  that step never existed in the NPU workflow, so the Qwen3.5 Ulysses SP
  test was never actually running on NPU
- delete tests/e2e/test_e2e_training.py, test_e2e_training_no_reshard.py
  and exec_scripts.py: they used non-relative `from exec_scripts import`
  (broken from repo root), were not wired into any CI workflow, and
  duplicated what test_e2e_parallel.py already covers with toy configs
- prune Level-3 E2E training docs in docs/testing.md and the
  qwen3_omni_moe_example.md onboarding guide to match

Made-with: Cursor
- Register `unit`, `e2e`, `gpu_only`, `npu_only`, `v5_only`, `dit`,
  `benchmark` markers in pyproject.toml. Each test file declares exactly
  one of `unit` / `e2e` via `pytestmark`, so CI can select by marker
  instead of enumerating paths
- New tests/conftest.py: CI NFS path fixtures (`ci_models_dir`,
  `ci_samples_dir`, `ci_dataset_dir`) and a collection hook that applies
  device / transformers-version skips based on the `VEOMNI_TEST_DEVICE`
  env var (set by the workflows). Avoids per-file `skipif` boilerplate
- New tests/e2e/conftest.py: hoist the six duplicated
  `dummy_*_dataset` session fixtures out of test_e2e_parallel.py so the
  planned per-modality split in Step 6 doesn't need to re-plumb them
- Tag out-of-CI ulysses async tests with `@pytest.mark.skip` so marker
  inclusion preserves current behavior; tag `test_video_utils.py` to use
  `os.environ.get("CI_SAMPLES_DIR", "")` instead of crashing at import
  when the env var is unset
- Tag `test_models_logits_equal.py` as `gpu_only` (matches current NPU
  workflow skipping it), `test_vlm_trainer.py` / `test_count_flops.py` /
  `test_qwen3_5_gated_deltanet_ulysses.py` as `v5_only`, and
  `test_npu_setup.py` as `npu_only`

Collection summary: `pytest -m unit` collects 464 tests, `pytest -m e2e`
collects 27 (total 491, no overlap). Device gates verified:
`VEOMNI_TEST_DEVICE=npu pytest test_models_logits_equal.py` skips all 6
cases with reason "gpu_only test skipped on device=npu".

Made-with: Cursor
Replaces the step-by-step hardcoded test invocation in the four top-level
test workflows (gpu_unit_tests, gpu_e2e_test, npu_unit_tests, npu_e2e_test)
with two reusable workflows that each take a pytest marker + a
transformers-version variant and run `pytest -m <marker>` once per
requested variant.

- _test_runner_gpu.yml: L20-8 container (verl-ci-cn-beijing GPU image,
  --gpus all --ipc host --shm-size 32g), `VEOMNI_TEST_DEVICE=gpu`,
  optional `run_dit_step` for the diffusers (`--extra dit`) leg that
  only gpu_e2e uses today
- _test_runner_npu.yml: 910B-8 Ascend container with the full device
  mount list, the ASCEND_VISIBLE_DEVICES slice-detection step, and
  `VEOMNI_TEST_DEVICE=npu`

The `transformers_variant` input takes `stable` / `v5` / `both`. During
the v4 deprecation window callers pass `both`, which runs the marker
selection once under transformers-stable and once under
`--extra transformers5-exp`. When v4 is fully retired the callers flip
to `v5` and the stable-branch steps go dead in one commit, with no
matrix infrastructure to untangle.

Two separate files (instead of a single device-parametrised workflow)
because GHA can't interpolate the `container.volumes` / `container.options`
block as a whole expression; Ascend's 16-device mount list and the L20's
`--gpus all` option are different enough that a merged runner would need
more `if:` gating than it saves in duplication.

These are pure workflow_call definitions — nothing runs yet until the
top-level workflows are rewired to use them (next commit).

Made-with: Cursor
Rewrites gpu_unit_tests / gpu_e2e_test / npu_unit_tests / npu_e2e_test
as ~40-line wrappers around the new reusable _test_runner_{gpu,npu}.yml
workflows (added previous commit). Each caller now declares only:

  - which pytest marker to select (`unit` or `e2e`)
  - the `both` transformers variant (v4 + v5 during the deprecation
    window; flips to `v5` when stable is removed)
  - whether to run the extra `--extra dit` leg (gpu_e2e only)

Net: 458 lines removed, 35 added. Container config, runner labels,
dependency sync ordering, and the ASCEND_VISIBLE_DEVICES slice
detection step all live in one place now, so a future image bump or
runner-label change is a single-file edit instead of four.

The previous workflows hardcoded each test path as its own step
(`pytest -s -x tests/parallel/ulysses/test_ulysses.py`, etc.). That
meant (a) a typo or truncation in one step silently disabled the
corresponding tests (see the two bugs fixed earlier in this branch)
and (b) newly added test files never ran in CI until somebody
remembered to append a new step. Using `pytest -m <marker> tests/`
replaces this with marker-based selection: any new test file that
declares `pytestmark = pytest.mark.unit` (or `e2e`) is picked up
automatically.

The path filters on each caller are extended to also trigger when the
reusable workflow it uses is edited, so changes to
_test_runner_{gpu,npu}.yml still invalidate CI correctly.

Made-with: Cursor
Collapse three ubuntu-latest-only CI workflows into a single
static_checks.yml with three parallel jobs:

- doc_task_paths (was check_doc_task_paths.yml) — verify tasks/*.py
  paths referenced in docs
- device_api (was device_api_check.yml) — lint for raw .cuda / "cuda" /
  "nccl" usage outside the whitelist
- patchgen (was check_patchgen.yml) — transformers patchgen drift
  detection, now path-gated via `git diff` so it only runs the heavy
  `uv sync --extra transformers5-exp` when patchgen-relevant files
  actually changed

Kept separate:
- check_pr_lint.yml — still a standalone workflow_call reusable so the
  four GPU/NPU test workflows can depend on it as a gating job without
  also pulling in the heavier patchgen step
- check_pr_title.yml — uses `pull_request.types: [opened, edited, ...]`
  which would needlessly re-run lint on every title edit if merged

Also moved tests/special_sanity/check_device_api_usage.py to
scripts/ci/check_device_api_usage.py — it was never a pytest test and
the `tests/` tree shouldn't host standalone CI scripts. Updated the
script's self-whitelist entry, its docstring invocation examples, and
the docs/testing.md tree diagram accordingly. The `tests/special_sanity/`
directory is now empty and removed.

Made-with: Cursor
The single 417-line file mixed text / VLM / omni / DiT parametrizations
with the shared torchrun+compare harness, so every model onboarding
diff touched one enormous file. Split into four per-modality files
plus a `_harness.py` module that owns `main()`, default tolerances and
the v4/v5/dit skip marks.

- tests/e2e/_harness.py — shared harness (main, _materialize_weights_dir,
  DEFAULT_RTOL/ATOL, v4_only / v5_only / dit_only marks).
- tests/e2e/test_e2e_parallel_text.py — text LLMs.
- tests/e2e/test_e2e_parallel_vlm.py — Qwen2-VL / Qwen3-VL family +
  Qwen3.5(-MoE) that share the VL dataset fixture.
- tests/e2e/test_e2e_parallel_omni.py — Qwen2.5-Omni / Qwen3-Omni-MoE.
- tests/e2e/test_e2e_parallel_dit.py — Wan DiT. Carries pytestmark =
  [e2e, dit] so the GPU e2e runner can select it with
  `-m "e2e and dit"` in the dedicated `--extra dit` step.

Verified `pytest --collect-only -m e2e tests/e2e` collects the same
22 cases as before, and `-m "e2e and dit"` selects only the Wan case.

Updated all docs and the v5 migration skill to point at the new paths
and import names (`DEFAULT_RTOL`, `v5_only`, etc. from `_harness.py`).
Dataset fixtures already live in tests/e2e/conftest.py so no fixture
relocation was needed.

Made-with: Cursor
…tests/fixtures/

Both directories hold static, checked-in test data (minimal model
configs, sample media) that tests reference by path but never
regenerate. Grouping them under a single tests/fixtures/ umbrella
makes the intent obvious and leaves a natural home for any future
static assets (golden outputs, reference JSON, etc.).

- tests/toy_config/ -> tests/fixtures/toy_config/ (17 model toy dirs)
- tests/testdata/ -> tests/fixtures/testdata/ (1 sample image)

Updated 22 files (~94 occurrences) including tests/, veomni/ core,
docs/, .agents/knowledge/, .agents/skills/, and .gitignore. Verified
`pytest --collect-only tests/` still collects 491 tests and
`make quality` passes.

Made-with: Cursor
@github-actions github-actions Bot added the ci label Apr 23, 2026
@FoolPlayer FoolPlayer changed the title [ci, test] refactor: consolidate workflows and reorganize tests/ [ci] refactor: consolidate workflows and reorganize tests Apr 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reorganizes the testing infrastructure by introducing a fixtures/ directory for static data and splitting the monolithic test_e2e_parallel.py into modality-specific files supported by a shared harness. It also centralizes CI path resolution and marker enforcement (e.g., device and version gates) in a top-level conftest.py. Feedback focuses on improving the e2e harness by using tmp_path to prevent potential race conditions and removing redundant sys.path manipulations across several test files.

Comment thread tests/e2e/_harness.py
Comment on lines +80 to +81
test_path = f"./{model_name}"
os.makedirs(test_path, exist_ok=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a hardcoded relative path (./{model_name}) for the test working directory is a high-risk pattern. It can lead to race conditions when tests are executed in parallel (e.g., via pytest-xdist) if multiple test cases share the same model name. Furthermore, if a test fails, the directory is not removed, potentially leaving stale artifacts that could interfere with future runs.

Recommendation: Use a unique temporary directory for each test invocation. The most robust way is to leverage pytest's tmp_path fixture in the test functions and pass it down to this main helper.


from ..tools.launch_utils import find_free_port

sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Manual sys.path manipulation is generally discouraged as it can lead to fragile import logic and environment pollution. Given that tests/conftest.py exists, pytest automatically adds the tests/ directory to sys.path during discovery. This manual insertion is redundant and should be removed to maintain a clean test environment.

from pathlib import Path


sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This manual sys.path modification is redundant. Pytest's standard behavior, when a conftest.py is present in the tests/ directory, is to include that directory in the search path. Removing this line simplifies the code and adheres to better testing practices.

import sys


sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Manually adding the parent directory to sys.path is unnecessary when running with pytest and a top-level conftest.py. It is better to rely on standard discovery mechanisms to avoid potential import conflicts and maintainability issues.

After switching the NPU reusable workflow to marker-based selection
(`pytest -m unit tests/`), collection on NPU fails on
`tests/ops/test_fused_moe_split_vs_merged.py` because the module's
top-level `from veomni.ops.kernels.moe._kernels.kernel.moe import ...`
transitively imports `triton`, which is not installed in the Ascend
image.

Pre-refactor the NPU workflow avoided this by hard-coding only
`tests/ops/test_comp.py` in its pytest invocation. Restore that scoping
with a local `tests/ops/conftest.py` whose `collect_ignore` list drops
the GPU-backend-only files when `VEOMNI_TEST_DEVICE=npu`.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant