Sync by DOH-LAF2303 · Pull Request #17 · NW-PaGe/rsv

DOH-LAF2303 · 2025-12-10T22:28:14Z

Description of proposed changes

Updates upstream changes, fixes custom_rules to fit new changes, changes build to 6y tree instead of all-time

Related issue(s)

Checklist

Checks pass
Update changelog

Realized through nextstrain#37 that the ingest pipeline does _not_ trigger the rebuild. The rebuild is just scheduled to run after the ingest workflow. Removing all parameters and references to trigger in this commit so that it does not confuse anyone else in the future. Keeping the schedule as-is since it's been working fine and we are planning to be shift pathogen workflows in the future to be able to go from ingest to a build within a single run without going through triggers and S3 interactions.

This prevents 16 unnecessary duplicate runs with the same inputs and outputs.

This prevents 18 unnecessary duplicate runs with the same inputs and outputs.

This prevents 16 unnecessary duplicate runs with the same inputs and outputs.

… ancestor with G duplication

ingest: Remove parameters related to trigger

Simplify the workflow using similar changes to <nextstrain/zika#83> Also removes extra options: * `PAT_GITHUB_DISPATCH` is not used in the ingest workflow since it is _not_ triggering the downstream phylo workflow. * `--printshellcmds` is a default flag already included in `nextstrain build` <https://github.com/nextstrain/cli/blob/7252a9b0d9b6e628500f9e2b991cc16a929f2879/nextstrain/cli/command/build.py#L209>

Add input `trial_name` as a way to start trial runs that deploy to staging. Motivated by recent comment <nextstrain#110 (comment)>

I noticed in a recent run of the ingest workflow that ~6min was spent on waiting for the Batch job to start. So I was curious if this workflow can run within GH Actions using the docker runtime. My trial run was 9min23s¹ compared to the previous run on AWS Batch that was 15m21s.² Just going to use the docker runtime since it's faster and free ¹ <https://github.com/nextstrain/rsv/actions/runs/18957867813> ² <https://github.com/nextstrain/rsv/actions/runs/18881097661>

GH Action workflow updates

…/vendored subrepo: subdir: "shared/vendored" merged: "bfbbb68" upstream: origin: "https://github.com/nextstrain/shared" branch: "main" commit: "bfbbb68" git-subrepo: version: "0.4.6" origin: "https://github.com/ingydotnet/git-subrepo" commit: "110b9eb"

Same vendored scripts are now available in shared/vendored

Copied from <https://github.com/nextstrain/pathogen-repo-guide/blob/7b7918d55d5088c5cd4b6b35001dc807dd77f129/phylogenetic/rules/merge_inputs.smk> Subsequent commits will modify rules for this pathogen.

Used in merge_inputs.smk following our Snakemake styleguide <https://docs.nextstrain.org/en/latest/reference/snakemake-style-guide.html#always-use-the-benchmark-directive>

Updated the filepaths for the rules copied from pathogen-repo-guide to support the `a_or_b` wildcard. It's not clear to me why the workflow uses `a_or_b` instead of a `subtype` wildcard to match the config param `subtypes`, but I'm not going to make the changes to consolidate them here. Removes snakemake_rules/download.smk since the multiple input support uses the Snakemake storage plugins to handle remote files. This bumps the minimum Snakemake version to 8.0.0.

Defines `inputs` to ensure that we only use OPEN PPX data as example data. Moves the chores.smk to only be included through `custom_rules` because the default workflow would run into warning: ``` CyclicGraphException in rule decompress_metadata in file "/nextstrain/build/workflow/snakemake_rules/merge_inputs.smk", line 62: Cyclic dependency on rule decompress_metadata. ```

Updated example data using the updated chore config ``` nextstrain build . update_example_data --configfile config/chores.yaml ```

Reorganized to match our usual phylogenetic READMEs and added new instructions for using the `inputs` and `additional_inputs` params for configuring workflow inputs.

Since the workflow now expects data at `results/{a_or_b}/*`, the built-in copy example data command in pathogen-repo-ci.yaml@v0 no longer works. Instead of updating the v0 workflow, just use the config to start from specific example_data inputs. This should _not_ require any changes in augur/docker-base/conda-base since those CI workflows are already using this CI config: <https://github.com/nextstrain/augur/blob/677d535eda13d370d4099558e0cca29db9abcafd/.github/workflows/ci.yaml#L268> <https://github.com/nextstrain/docker-base/blob/9ec2845e06e331877eae5f446fd4adc56cd33d9f/.github/workflows/ci.yml#L219> <https://github.com/nextstrain/conda-base/blob/9048d8410e7b3a1a7098dd5c498234a489b8ab0b/.github/workflows/ci.yaml#L148>

Simplifies the ingest/Snakefile to easily understand what are the outputs of the workflow and hides the upload process in a Nextstrain automation build config.

Updated to match the pathogen-repo-guide at <https://github.com/nextstrain/pathogen-repo-guide/tree/4784a831fc78bf1cdc416824b26ce36ad4f5bcc2/ingest/build-configs/nextstrain-automation> This simplified the upload config and makes it easier to understand which files are uploaded to S3 as `*_with_restricted`.

Multiple input sources are expected to be defined in the phylo workflow going forward, so we no longer need to support it here. With the recent switch to PPX data, it was also obvious that multiple sources also doesn't work well when the curations are pretty different.

Extract "OPEN" and "RESTRICTED" data into separate files that are uploaded to S3 separately. This will reduce the amount of duplicate data that we host on S3. Outside of the changes in the workflow, we should delete the previously uploaded "*_with_restricted" files from S3 so that they are not confused with the new "*_restricted" files added here.

Since the previous commit separates the OPEN and RESTRICTED files on S3, update the phylo config to start from these multiple inputs.

…-files Separate files for RESTRICTED sequences

phylogenetic: Add standardized multiple inputs

joverlee521 and others added 30 commits September 13, 2023 17:00

Create one sequence index per subtype

276f591

This prevents 16 unnecessary duplicate runs with the same inputs and outputs.

Create one new reference per subtype/gene

8686733

This prevents 18 unnecessary duplicate runs with the same inputs and outputs.

Create one colors file per subtype

9c817ad

This prevents 16 unnecessary duplicate runs with the same inputs and outputs.

phylo: limit time-scoped builds to ancestor with G duplication

af8a9b7

Merge pull request nextstrain#105: Simplify workflow graph

b5572a9

Merge pull request nextstrain#110: phylo: limit time-scoped builds to…

8843c64

… ancestor with G duplication

Merge pull request nextstrain#39 from nextstrain/remove-trigger

be44e6d

ingest: Remove parameters related to trigger

.github/workflows/rebuild: Add input trial_name

d147d08

Add input `trial_name` as a way to start trial runs that deploy to staging. Motivated by recent comment <nextstrain#110 (comment)>

Merge pull request nextstrain#111 from nextstrain/gh-action-workflows

401279d

GH Action workflow updates

ingest/upload: use shared/vendored

3a5667a

ingest: removed vendored

d6169be

Same vendored scripts are now available in shared/vendored

Copy merge_inputs.smk from pathogen-repo-guide

85a23c2

Copied from <https://github.com/nextstrain/pathogen-repo-guide/blob/7b7918d55d5088c5cd4b6b35001dc807dd77f129/phylogenetic/rules/merge_inputs.smk> Subsequent commits will modify rules for this pathogen.

.gitignore: add benchmarks/

157b252

Used in merge_inputs.smk following our Snakemake styleguide <https://docs.nextstrain.org/en/latest/reference/snakemake-style-guide.html#always-use-the-benchmark-directive>

Update example data to latest metadata schema w/ PPX data

81c0217

Updated example data using the updated chore config ``` nextstrain build . update_example_data --configfile config/chores.yaml ```

Update instructions in top-level README

9f50c9c

Reorganized to match our usual phylogenetic READMEs and added new instructions for using the `inputs` and `additional_inputs` params for configuring workflow inputs.

ingest: Move upload rules to build-configs/nextstrain-automation

1c8634f

Simplifies the ingest/Snakefile to easily understand what are the outputs of the workflow and hides the upload process in a Nextstrain automation build config.

ingest: Remove unused NCBI rules/configs

7084213

ingest: Remove commented code

4304491

phylo: Update to start from multiple inputs

c202aa5

Since the previous commit separates the OPEN and RESTRICTED files on S3, update the phylo config to start from these multiple inputs.

nextclade: reconstruct founder sequences

30ff62f

joverlee521 and others added 4 commits November 17, 2025 11:53

Merge pull request nextstrain#112 from nextstrain/separate-restricted…

3bac488

…-files Separate files for RESTRICTED sequences

Merge pull request nextstrain#108 from nextstrain/multiple-inputs

fc6c095

phylogenetic: Add standardized multiple inputs

Merge upstream changes

5c16dd9

Merge upstream changes

931ce95

DOH-LAF2303 merged commit bb1bb99 into master Dec 10, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync#17

Sync#17
DOH-LAF2303 merged 34 commits into
masterfrom
sync

DOH-LAF2303 commented Dec 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

DOH-LAF2303 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of proposed changes

Related issue(s)

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DOH-LAF2303 commented Dec 10, 2025 •

edited

Loading