Skip to content

Remove orphan rules find_clusters and extract_meta#1209

Open
trvrb wants to merge 2 commits into
masterfrom
cleanup-orphan-rules
Open

Remove orphan rules find_clusters and extract_meta#1209
trvrb wants to merge 2 commits into
masterfrom
cleanup-orphan-rules

Conversation

@trvrb

@trvrb trvrb commented Jun 27, 2026

Copy link
Copy Markdown
Member

Second in the pandemic-era cruft cleanup series (after #1208).

Motivation

A couple of rules produce outputs that nothing downstream consumes, so they never run as part of any build. Removing them shrinks the workflow with no behavior change.

What's removed

  • find_clusters (main_workflow.smk) + scripts/find_clusters.py + the cluster: block in defaults/parameters.yaml that only this rule read. Its clusters.tsv output has no downstream consumer.
  • extract_meta (export_for_nextstrain.smk); its extracted_metadata.tsv output has no downstream consumer.

Deliberately kept

The manual maintenance rules clean_export_regions / export_all_regions are kept. Unlike the two above, they're a documented in-file procedure (see the header comment in export_for_nextstrain.smk) for re-curating lat_longs.tsv and color_ordering.tsv after an AWS data download, via scripts/check_missing_locations.py — operating on still-active files. If you'd rather these go too, say so and I'll drop them.

Verification

  • No remaining references to the removed rules, their outputs, the script, or the cluster config across workflow/, defaults/, docs/, nextstrain_profiles/.
  • snakemake --profile nextstrain_profiles/nextstrain-ci -n is unchanged (37-job DAG, no errors) — confirming these rules weren't in any build's DAG.

Test plan

  • CI green.

🤖 Generated with Claude Code

trvrb and others added 2 commits June 26, 2026 20:17
Both rules produce outputs that are never consumed by any other rule or build
target, so they never run in practice.

- find_clusters (main_workflow.smk) + scripts/find_clusters.py + the cluster:
  config block in defaults/parameters.yaml that only it read. Its clusters.tsv
  output has no downstream consumer.
- extract_meta (export_for_nextstrain.smk); its extracted_metadata.tsv output
  has no downstream consumer.

The manual maintenance rules clean_export_regions / export_all_regions are kept
deliberately: they are a documented in-file procedure (re-curating lat_longs and
color_ordering after an AWS data download via scripts/check_missing_locations.py)
that operates on still-active files.

A dry-run of the CI profile is unchanged (37-job DAG, no errors).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant