Skip to content

fix pipeline#84

Merged
dimalvovs merged 5 commits intomainfrom
nextflow
Nov 11, 2025
Merged

fix pipeline#84
dimalvovs merged 5 commits intomainfrom
nextflow

Conversation

@dimalvovs
Copy link
Collaborator

This pull request updates the spacemarkers pipeline to improve reproducibility, flexibility, and gene selection logic. The most important changes include pinning all container images to a specific digest for reproducibility, updating function calls to use new snake_case naming conventions, and adding configurable parameters for ligand-receptor gene filtering and gene expression thresholds. Additionally, the logic for gene selection in the HD pipeline is now controlled by pipeline parameters.

Reproducibility and Container Management:

  • All references to the spacemarkers container in Nextflow processes are now pinned to a specific image digest (sha256:9c06f8f...) instead of using floating tags. This ensures consistent environments across runs. [1] [2] [3] [4] [5]

Function Naming Consistency:

  • Updated several R function calls in main.nf to use snake_case (e.g., get_spatial_features, get_spatial_parameters, find_all_hotspots, calculate_overlap_undirected, get_pairwise_interacting_genes) for consistency with updated codebase conventions. [1] [2]

Parameterization and Flexibility:

  • Added new parameters use_ligand_receptor_genes (boolean) and good_gene_threshold (integer) to nextflow.config for controlling gene filtering behavior in the HD pipeline.

Gene Selection Logic in HD Pipeline:

  • Refactored hdPipeline.R to use the new parameters: if use_ligand_receptor_genes is true, the analysis is restricted to ligand-receptor genes using the CellChat database; otherwise, the top good_gene_threshold most highly expressed genes are selected for analysis. [1] [2] [3]

Output Handling Improvements:

  • The LRscores output in the SPACEMARKERS_HD process is now marked as optional, reflecting that it may not always be produced depending on pipeline parameters.

@dimalvovs
Copy link
Collaborator Author

upstream happy case tests pass

🚀 nf-test 0.9.2
https://www.nf-test.com
(c) 2021 - 2024 Lukas Forer and Sebastian Schoenherr


Test Workflow main.nf

  Test [2b1b8044] 'Just check samplesheet and read visium to adata' PASSED (39.283s)
  Test [5d40ee1a] 'Fail on empty samplesheet' PASSED (5.221s)
  Test [abcbcbcb] 'RCTD on Visium SD with local ref' PASSED (132.248s)
  Test [c2f6ac3f] 'RCTD on Multisample Visium SD with local ref' PASSED (453.398s)
  Test [2424d674] 'Full pipeline on Visium SD with remote ref' PASSED (290.981s)
  Test [b008168f] 'Full pipeline on Visium HD with remote ref' PASSED (199.744s)
  Test [c17e5da8] 'Visium SD with external deconvolution' PASSED (117.813s)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the HD pipeline to make ligand-receptor gene analysis configurable and updates container references to use SHA256 digests. It also standardizes function naming conventions to use snake_case.

  • Adds configurable parameters for ligand-receptor gene filtering and gene expression thresholds
  • Updates container references from tag-based to SHA256 digest-based pinning
  • Refactors function names from camelCase to snake_case for consistency

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
nextflow/templates/hdPipeline.R Refactored to use configurable parameters, moved ligand-receptor loading inside conditional block, and updated "good genes" logic to select top N genes
nextflow/nextflow.config Added new parameters for ligand-receptor gene usage and gene threshold configuration
nextflow/main_hd.nf Updated container to SHA256 digest and made LRscores output optional
nextflow/main.nf Updated container to SHA256 digest and refactored function names to snake_case

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

saveRDS(receptor_scores, file = sprintf("%s/receptor_scores.rds", output_dir))
saveRDS(lr_scores, file = sprintf("%s/LRscores.rds", output_dir))
} else {
# Use top "good" genes
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment 'Use top "good" genes' is misleading. It should be updated to 'Select top N genes by expression' or similar to accurately reflect that this selects the top genes by total expression rather than filtering by a threshold.

Suggested change
# Use top "good" genes
# Select top N genes by total expression

Copilot uses AI. Check for mistakes.
tag "$meta.id"
label 'process_high_memory'
container 'ghcr.io/deshpandelab/spacemarkers:main'
container 'ghcr.io/deshpandelab/spacemarkers@sha256:9c06f8f9340bb5c51300dbf3bc4e803613a15e1bd349eae43d5a129462a13f4e'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only one I did not test. Is this the right container address? A named version would look nicer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would look nicer, but would provide less reproducibility guarantees, voting to keep as is

@dimalvovs dimalvovs merged commit ead294d into main Nov 11, 2025
4 checks passed
@dimalvovs dimalvovs deleted the nextflow branch November 11, 2025 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants