Sync#17
Merged
Merged
Conversation
Realized through nextstrain#37 that the ingest pipeline does _not_ trigger the rebuild. The rebuild is just scheduled to run after the ingest workflow. Removing all parameters and references to trigger in this commit so that it does not confuse anyone else in the future. Keeping the schedule as-is since it's been working fine and we are planning to be shift pathogen workflows in the future to be able to go from ingest to a build within a single run without going through triggers and S3 interactions.
This prevents 16 unnecessary duplicate runs with the same inputs and outputs.
This prevents 18 unnecessary duplicate runs with the same inputs and outputs.
This prevents 16 unnecessary duplicate runs with the same inputs and outputs.
… ancestor with G duplication
ingest: Remove parameters related to trigger
Simplify the workflow using similar changes to <nextstrain/zika#83> Also removes extra options: * `PAT_GITHUB_DISPATCH` is not used in the ingest workflow since it is _not_ triggering the downstream phylo workflow. * `--printshellcmds` is a default flag already included in `nextstrain build` <https://github.com/nextstrain/cli/blob/7252a9b0d9b6e628500f9e2b991cc16a929f2879/nextstrain/cli/command/build.py#L209>
Add input `trial_name` as a way to start trial runs that deploy to staging. Motivated by recent comment <nextstrain#110 (comment)>
I noticed in a recent run of the ingest workflow that ~6min was spent on waiting for the Batch job to start. So I was curious if this workflow can run within GH Actions using the docker runtime. My trial run was 9min23s¹ compared to the previous run on AWS Batch that was 15m21s.² Just going to use the docker runtime since it's faster and free ¹ <https://github.com/nextstrain/rsv/actions/runs/18957867813> ² <https://github.com/nextstrain/rsv/actions/runs/18881097661>
GH Action workflow updates
…/vendored subrepo: subdir: "shared/vendored" merged: "bfbbb68" upstream: origin: "https://github.com/nextstrain/shared" branch: "main" commit: "bfbbb68" git-subrepo: version: "0.4.6" origin: "https://github.com/ingydotnet/git-subrepo" commit: "110b9eb"
Same vendored scripts are now available in shared/vendored
Copied from <https://github.com/nextstrain/pathogen-repo-guide/blob/7b7918d55d5088c5cd4b6b35001dc807dd77f129/phylogenetic/rules/merge_inputs.smk> Subsequent commits will modify rules for this pathogen.
Used in merge_inputs.smk following our Snakemake styleguide <https://docs.nextstrain.org/en/latest/reference/snakemake-style-guide.html#always-use-the-benchmark-directive>
Updated the filepaths for the rules copied from pathogen-repo-guide to support the `a_or_b` wildcard. It's not clear to me why the workflow uses `a_or_b` instead of a `subtype` wildcard to match the config param `subtypes`, but I'm not going to make the changes to consolidate them here. Removes snakemake_rules/download.smk since the multiple input support uses the Snakemake storage plugins to handle remote files. This bumps the minimum Snakemake version to 8.0.0.
Defines `inputs` to ensure that we only use OPEN PPX data as example data. Moves the chores.smk to only be included through `custom_rules` because the default workflow would run into warning: ``` CyclicGraphException in rule decompress_metadata in file "/nextstrain/build/workflow/snakemake_rules/merge_inputs.smk", line 62: Cyclic dependency on rule decompress_metadata. ```
Updated example data using the updated chore config ``` nextstrain build . update_example_data --configfile config/chores.yaml ```
Reorganized to match our usual phylogenetic READMEs and added new instructions for using the `inputs` and `additional_inputs` params for configuring workflow inputs.
Since the workflow now expects data at `results/{a_or_b}/*`, the
built-in copy example data command in pathogen-repo-ci.yaml@v0 no longer works.
Instead of updating the v0 workflow, just use the config to start from specific
example_data inputs.
This should _not_ require any changes in augur/docker-base/conda-base since
those CI workflows are already using this CI config:
<https://github.com/nextstrain/augur/blob/677d535eda13d370d4099558e0cca29db9abcafd/.github/workflows/ci.yaml#L268>
<https://github.com/nextstrain/docker-base/blob/9ec2845e06e331877eae5f446fd4adc56cd33d9f/.github/workflows/ci.yml#L219>
<https://github.com/nextstrain/conda-base/blob/9048d8410e7b3a1a7098dd5c498234a489b8ab0b/.github/workflows/ci.yaml#L148>
Simplifies the ingest/Snakefile to easily understand what are the outputs of the workflow and hides the upload process in a Nextstrain automation build config.
Updated to match the pathogen-repo-guide at <https://github.com/nextstrain/pathogen-repo-guide/tree/4784a831fc78bf1cdc416824b26ce36ad4f5bcc2/ingest/build-configs/nextstrain-automation> This simplified the upload config and makes it easier to understand which files are uploaded to S3 as `*_with_restricted`.
Multiple input sources are expected to be defined in the phylo workflow going forward, so we no longer need to support it here. With the recent switch to PPX data, it was also obvious that multiple sources also doesn't work well when the curations are pretty different.
Extract "OPEN" and "RESTRICTED" data into separate files that are uploaded to S3 separately. This will reduce the amount of duplicate data that we host on S3. Outside of the changes in the workflow, we should delete the previously uploaded "*_with_restricted" files from S3 so that they are not confused with the new "*_restricted" files added here.
Since the previous commit separates the OPEN and RESTRICTED files on S3, update the phylo config to start from these multiple inputs.
…-files Separate files for RESTRICTED sequences
phylogenetic: Add standardized multiple inputs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
Updates upstream changes, fixes custom_rules to fit new changes, changes build to 6y tree instead of all-time
Related issue(s)
Checklist