[ci] refactor: consolidate workflows and reorganize tests by FoolPlayer · Pull Request #687 · ByteDance-Seed/VeOmni

FoolPlayer · 2026-04-23T13:59:18Z

What does this PR do?

Cleans up the CI workflow layer and the tests/ tree so that both are
easier to reason about and extend. No runtime / training code behavior
changes. Grouped into seven logical commits so each can be reviewed
independently.

Motivation: the existing setup had three overlapping problems:

Each of the four top-level test workflows (gpu_unit_tests,
npu_unit_tests, gpu_e2e_test, npu_e2e_test) hard-coded its own
container, dependency sync, and pytest path list; a small change
required editing all four in lock-step.

Test selection was driven by hard-coded file paths in the workflows,
so any file rename or test split needed matching CI edits.

tests/e2e/test_e2e_parallel.py had grown to 417 lines mixing the
harness with four modalities' parametrize lists, and two orphan e2e
training tests with broken imports were still checked in.

Checklist Before Starting

No linked issue — this is internal CI hygiene.
PR title follows [{modules}] {type}: {description} format.

Test

Local verification on fp/refactor_tests_and_ci:

make quality clean (ruff check + ruff format --check, 428 files).
pytest --collect-only tests/ collects 491 tests (same as
pre-refactor).
pytest --collect-only -m e2e tests/e2e collects 22 cases
(same as the old monolithic file).
pytest --collect-only -m "e2e and dit" tests/e2e collects exactly
the single Wan DiT case, which is what the new e2e GPU workflow's
--extra dit step relies on.
The full GPU unit / e2e suites will run on this PR via the refactored
workflows themselves — that is the end-to-end validation.

API and Usage Example

N/A — no public API, CLI, or config-field changes. Two internal paths
changed for contributors:

Fixture configs moved:
tests/toy_config/<m>_toy/ → tests/fixtures/toy_config/<m>_toy/
tests/testdata/ → tests/fixtures/testdata/
The single tests/e2e/test_e2e_parallel.py is now four per-modality
files. When onboarding a new model, add the pytest.param(...) to
the matching test_e2e_parallel_{text,vlm,omni,dit}.py. See
updated docs/testing.md and docs/transformers_v5/testing_new_model.md.

Design & Code Changes

Commit-by-commit (newest → oldest, each is its own logical unit):

1. [test] refactor: consolidate tests/toy_config + tests/testdata under tests/fixtures/

tests/toy_config/ → tests/fixtures/toy_config/ (17 *_toy dirs).
tests/testdata/ → tests/fixtures/testdata/ (1 sample image).
Updated ~94 references across 22 files (tests/, veomni/data/dummy_dataset.py,
docs/, .agents/, .gitignore).

2. [test] refactor: split tests/e2e/test_e2e_parallel.py by modality

New tests/e2e/_harness.py owns main(), _materialize_weights_dir(),
DEFAULT_RTOL/ATOL, and the v4_only/v5_only/dit_only marks.
Four per-modality test files replace the monolith:
test_e2e_parallel_text.py, _vlm.py, _omni.py, _dit.py.
The DiT file carries pytestmark = [e2e, dit] so the GPU e2e runner
can pick it up via -m "e2e and dit" in a dedicated --extra dit step.
docs/, .agents/skills/veomni-migrate-transformers-v5/SKILL.md, and
docs/transformers_v5/testing_new_model.md updated to reflect the
new layout and import names.

3. [ci] refactor: consolidate static checks + move device-api script

Merged three single-purpose workflows (check_doc_task_paths.yml,
device_api_check.yml, check_patchgen.yml) into a single
static_checks.yml with parallel jobs.
The patchgen job is now path-gated: it only runs when files under
veomni/models/transformers/** change (was always-on before).
Moved tests/special_sanity/check_device_api_usage.py → scripts/ci/check_device_api_usage.py
(it is a standalone script, not a pytest test). Removed the empty
tests/special_sanity/ directory.

4. [ci] refactor: convert 4 top-level test workflows to thin runner callers

gpu_unit_tests.yml, npu_unit_tests.yml, gpu_e2e_test.yml,
npu_e2e_test.yml are now each ~40 lines: just declare which marker
to run and which transformers variant, then uses: the reusable
runner. Container image, volumes, NFS mounts, dep-sync logic live in
one place.

5. [ci] feat: add reusable _test_runner_{gpu,npu}.yml workflows

Two new reusable workflows (workflow_call) own the full GPU/NPU
test execution pipeline.
Inputs: marker (e.g. unit, e2e), transformers_variant
(stable / v5 / both), run_dit_step (GPU only).
transformers_variant=both runs the same selection twice — first
under the transformers-stable group, then after re-syncing with
--extra transformers5-exp. When v4 is fully deprecated, callers
switch to v5 and the v4 branch becomes dead code that can be
dropped in one commit.
run_dit_step adds a third step that syncs --extra dit and runs
pytest -m "<marker> and dit" — only the GPU e2e caller uses it.

6. [test] feat: add pytest markers + centralized conftest for tests/

Registered new markers in pyproject.toml: unit, e2e,
gpu_only, npu_only, v5_only, dit, benchmark.
New tests/conftest.py:
- session-scoped fixtures ci_models_dir, ci_samples_dir,
  ci_dataset_dir sourced from env vars the reusable workflows set.
- pytest_collection_modifyitems hook applies device-appropriate
  skips (gpu_only / npu_only based on VEOMNI_TEST_DEVICE) and
  transformers-version skips (v5_only based on
  is_transformers_version_greater_or_equal_to("5.0.0")).
Annotated every tests/**/*.py with pytestmark = [pytest.mark.<unit|e2e>, ...]
so CI can select by marker instead of file path.
Added tests/e2e/conftest.py consolidating the six
dummy_*_dataset session fixtures previously duplicated inside the
monolithic e2e file.

7. [ci, test] fix: repair broken CI steps and drop orphan e2e training tests

gpu_unit_tests.yml: fixed a prior commit that had truncated the
data-step command to uv run --froze (missing the rest of the line).
npu_unit_tests.yml: removed an if: steps.changes.outputs.qwen3_5_ulysses == 'true'
that referenced a step which no longer exists, silently skipping the test.
Deleted tests/e2e/test_e2e_training.py, tests/e2e/test_e2e_training_no_reshard.py,
tests/e2e/exec_scripts.py — import-broken, never invoked by CI, and
superseded by tests/train_scripts/. All doc references updated.

Checklist Before Submitting

Read the Contribute Guide
Applied pre-commit checks (make quality clean)
Added/updated documentation (docs/testing.md, docs/transformers_v5/*.md, docs/usage/support_new_models/*.md, .agents/knowledge/architecture.md, .agents/skills/*/SKILL.md)
No tasks/ training scripts moved or renamed in this PR
Added tests to CI workflow — existing tests re-annotated with markers; CI selection is now by marker, so any new test picked up by -m unit / -m e2e automatically runs.

…ests - gpu_unit_tests.yml: restore `pytest -s -x tests/data` (prior commit left the step as `uv run --froze`, silently disabling data tests on GPU) - npu_unit_tests.yml: drop `if: steps.changes.outputs.qwen3_5_ulysses` — that step never existed in the NPU workflow, so the Qwen3.5 Ulysses SP test was never actually running on NPU - delete tests/e2e/test_e2e_training.py, test_e2e_training_no_reshard.py and exec_scripts.py: they used non-relative `from exec_scripts import` (broken from repo root), were not wired into any CI workflow, and duplicated what test_e2e_parallel.py already covers with toy configs - prune Level-3 E2E training docs in docs/testing.md and the qwen3_omni_moe_example.md onboarding guide to match Made-with: Cursor

- Register `unit`, `e2e`, `gpu_only`, `npu_only`, `v5_only`, `dit`, `benchmark` markers in pyproject.toml. Each test file declares exactly one of `unit` / `e2e` via `pytestmark`, so CI can select by marker instead of enumerating paths - New tests/conftest.py: CI NFS path fixtures (`ci_models_dir`, `ci_samples_dir`, `ci_dataset_dir`) and a collection hook that applies device / transformers-version skips based on the `VEOMNI_TEST_DEVICE` env var (set by the workflows). Avoids per-file `skipif` boilerplate - New tests/e2e/conftest.py: hoist the six duplicated `dummy_*_dataset` session fixtures out of test_e2e_parallel.py so the planned per-modality split in Step 6 doesn't need to re-plumb them - Tag out-of-CI ulysses async tests with `@pytest.mark.skip` so marker inclusion preserves current behavior; tag `test_video_utils.py` to use `os.environ.get("CI_SAMPLES_DIR", "")` instead of crashing at import when the env var is unset - Tag `test_models_logits_equal.py` as `gpu_only` (matches current NPU workflow skipping it), `test_vlm_trainer.py` / `test_count_flops.py` / `test_qwen3_5_gated_deltanet_ulysses.py` as `v5_only`, and `test_npu_setup.py` as `npu_only` Collection summary: `pytest -m unit` collects 464 tests, `pytest -m e2e` collects 27 (total 491, no overlap). Device gates verified: `VEOMNI_TEST_DEVICE=npu pytest test_models_logits_equal.py` skips all 6 cases with reason "gpu_only test skipped on device=npu". Made-with: Cursor

Replaces the step-by-step hardcoded test invocation in the four top-level test workflows (gpu_unit_tests, gpu_e2e_test, npu_unit_tests, npu_e2e_test) with two reusable workflows that each take a pytest marker + a transformers-version variant and run `pytest -m <marker>` once per requested variant. - _test_runner_gpu.yml: L20-8 container (verl-ci-cn-beijing GPU image, --gpus all --ipc host --shm-size 32g), `VEOMNI_TEST_DEVICE=gpu`, optional `run_dit_step` for the diffusers (`--extra dit`) leg that only gpu_e2e uses today - _test_runner_npu.yml: 910B-8 Ascend container with the full device mount list, the ASCEND_VISIBLE_DEVICES slice-detection step, and `VEOMNI_TEST_DEVICE=npu` The `transformers_variant` input takes `stable` / `v5` / `both`. During the v4 deprecation window callers pass `both`, which runs the marker selection once under transformers-stable and once under `--extra transformers5-exp`. When v4 is fully retired the callers flip to `v5` and the stable-branch steps go dead in one commit, with no matrix infrastructure to untangle. Two separate files (instead of a single device-parametrised workflow) because GHA can't interpolate the `container.volumes` / `container.options` block as a whole expression; Ascend's 16-device mount list and the L20's `--gpus all` option are different enough that a merged runner would need more `if:` gating than it saves in duplication. These are pure workflow_call definitions — nothing runs yet until the top-level workflows are rewired to use them (next commit). Made-with: Cursor

Rewrites gpu_unit_tests / gpu_e2e_test / npu_unit_tests / npu_e2e_test as ~40-line wrappers around the new reusable _test_runner_{gpu,npu}.yml workflows (added previous commit). Each caller now declares only: - which pytest marker to select (`unit` or `e2e`) - the `both` transformers variant (v4 + v5 during the deprecation window; flips to `v5` when stable is removed) - whether to run the extra `--extra dit` leg (gpu_e2e only) Net: 458 lines removed, 35 added. Container config, runner labels, dependency sync ordering, and the ASCEND_VISIBLE_DEVICES slice detection step all live in one place now, so a future image bump or runner-label change is a single-file edit instead of four. The previous workflows hardcoded each test path as its own step (`pytest -s -x tests/parallel/ulysses/test_ulysses.py`, etc.). That meant (a) a typo or truncation in one step silently disabled the corresponding tests (see the two bugs fixed earlier in this branch) and (b) newly added test files never ran in CI until somebody remembered to append a new step. Using `pytest -m <marker> tests/` replaces this with marker-based selection: any new test file that declares `pytestmark = pytest.mark.unit` (or `e2e`) is picked up automatically. The path filters on each caller are extended to also trigger when the reusable workflow it uses is edited, so changes to _test_runner_{gpu,npu}.yml still invalidate CI correctly. Made-with: Cursor

Collapse three ubuntu-latest-only CI workflows into a single static_checks.yml with three parallel jobs: - doc_task_paths (was check_doc_task_paths.yml) — verify tasks/*.py paths referenced in docs - device_api (was device_api_check.yml) — lint for raw .cuda / "cuda" / "nccl" usage outside the whitelist - patchgen (was check_patchgen.yml) — transformers patchgen drift detection, now path-gated via `git diff` so it only runs the heavy `uv sync --extra transformers5-exp` when patchgen-relevant files actually changed Kept separate: - check_pr_lint.yml — still a standalone workflow_call reusable so the four GPU/NPU test workflows can depend on it as a gating job without also pulling in the heavier patchgen step - check_pr_title.yml — uses `pull_request.types: [opened, edited, ...]` which would needlessly re-run lint on every title edit if merged Also moved tests/special_sanity/check_device_api_usage.py to scripts/ci/check_device_api_usage.py — it was never a pytest test and the `tests/` tree shouldn't host standalone CI scripts. Updated the script's self-whitelist entry, its docstring invocation examples, and the docs/testing.md tree diagram accordingly. The `tests/special_sanity/` directory is now empty and removed. Made-with: Cursor

The single 417-line file mixed text / VLM / omni / DiT parametrizations with the shared torchrun+compare harness, so every model onboarding diff touched one enormous file. Split into four per-modality files plus a `_harness.py` module that owns `main()`, default tolerances and the v4/v5/dit skip marks. - tests/e2e/_harness.py — shared harness (main, _materialize_weights_dir, DEFAULT_RTOL/ATOL, v4_only / v5_only / dit_only marks). - tests/e2e/test_e2e_parallel_text.py — text LLMs. - tests/e2e/test_e2e_parallel_vlm.py — Qwen2-VL / Qwen3-VL family + Qwen3.5(-MoE) that share the VL dataset fixture. - tests/e2e/test_e2e_parallel_omni.py — Qwen2.5-Omni / Qwen3-Omni-MoE. - tests/e2e/test_e2e_parallel_dit.py — Wan DiT. Carries pytestmark = [e2e, dit] so the GPU e2e runner can select it with `-m "e2e and dit"` in the dedicated `--extra dit` step. Verified `pytest --collect-only -m e2e tests/e2e` collects the same 22 cases as before, and `-m "e2e and dit"` selects only the Wan case. Updated all docs and the v5 migration skill to point at the new paths and import names (`DEFAULT_RTOL`, `v5_only`, etc. from `_harness.py`). Dataset fixtures already live in tests/e2e/conftest.py so no fixture relocation was needed. Made-with: Cursor

…tests/fixtures/ Both directories hold static, checked-in test data (minimal model configs, sample media) that tests reference by path but never regenerate. Grouping them under a single tests/fixtures/ umbrella makes the intent obvious and leaves a natural home for any future static assets (golden outputs, reference JSON, etc.). - tests/toy_config/ -> tests/fixtures/toy_config/ (17 model toy dirs) - tests/testdata/ -> tests/fixtures/testdata/ (1 sample image) Updated 22 files (~94 occurrences) including tests/, veomni/ core, docs/, .agents/knowledge/, .agents/skills/, and .gitignore. Verified `pytest --collect-only tests/` still collects 491 tests and `make quality` passes. Made-with: Cursor

gemini-code-assist

Code Review

This pull request reorganizes the testing infrastructure by introducing a fixtures/ directory for static data and splitting the monolithic test_e2e_parallel.py into modality-specific files supported by a shared harness. It also centralizes CI path resolution and marker enforcement (e.g., device and version gates) in a top-level conftest.py. Feedback focuses on improving the e2e harness by using tmp_path to prevent potential race conditions and removing redundant sys.path manipulations across several test files.

gemini-code-assist · 2026-04-23T14:01:46Z

+    test_path = f"./{model_name}"
+    os.makedirs(test_path, exist_ok=True)


Using a hardcoded relative path (./{model_name}) for the test working directory is a high-risk pattern. It can lead to race conditions when tests are executed in parallel (e.g., via pytest-xdist) if multiple test cases share the same model name. Furthermore, if a test fails, the directory is not removed, potentially leaving stale artifacts that could interfere with future runs.

Recommendation: Use a unique temporary directory for each test invocation. The most robust way is to leverage pytest's tmp_path fixture in the test functions and pass it down to this main helper.

gemini-code-assist · 2026-04-23T14:01:47Z


-from ..tools.launch_utils import find_free_port
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))


Manual sys.path manipulation is generally discouraged as it can lead to fragile import logic and environment pollution. Given that tests/conftest.py exists, pytest automatically adds the tests/ directory to sys.path during discovery. This manual insertion is redundant and should be removed to maintain a clean test environment.

gemini-code-assist · 2026-04-23T14:01:47Z

 from pathlib import Path

+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))


This manual sys.path modification is redundant. Pytest's standard behavior, when a conftest.py is present in the tests/ directory, is to include that directory in the search path. Removing this line simplifies the code and adheres to better testing practices.

gemini-code-assist · 2026-04-23T14:01:47Z

+import sys
+
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))


Manually adding the parent directory to sys.path is unnecessary when running with pytest and a top-level conftest.py. It is better to rely on standard discovery mechanisms to avoid potential import conflicts and maintainability issues.

After switching the NPU reusable workflow to marker-based selection (`pytest -m unit tests/`), collection on NPU fails on `tests/ops/test_fused_moe_split_vs_merged.py` because the module's top-level `from veomni.ops.kernels.moe._kernels.kernel.moe import ...` transitively imports `triton`, which is not installed in the Ascend image. Pre-refactor the NPU workflow avoided this by hard-coding only `tests/ops/test_comp.py` in its pytest invocation. Restore that scoping with a local `tests/ops/conftest.py` whose `collect_ignore` list drops the GPU-backend-only files when `VEOMNI_TEST_DEVICE=npu`. Made-with: Cursor

FoolPlayer and others added 15 commits April 23, 2026 20:36

update ci workflow

a91301d

fix docker host bug

4cb53cb

fix npu ci workflow

6338b07

tmp:disable npu proxy

89985f8

fix npu

58f72ee

fix hf path

043984d

fix tools import

65581f8

fix ci env

e2535dc

github-actions Bot added the ci label Apr 23, 2026

FoolPlayer changed the title ~~[ci, test] refactor: consolidate workflows and reorganize tests/~~ [ci] refactor: consolidate workflows and reorganize tests Apr 23, 2026

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] refactor: consolidate workflows and reorganize tests#687

[ci] refactor: consolidate workflows and reorganize tests#687
FoolPlayer wants to merge 16 commits intomainfrom
fp/refactor_tests_and_ci

FoolPlayer commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		test_path = f"./{model_name}"
		os.makedirs(test_path, exist_ok=True)


		from ..tools.launch_utils import find_free_port

		sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))

		from pathlib import Path


		sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))

		import sys


		sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))

Conversation

FoolPlayer commented Apr 23, 2026

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant