Conversation
|
upstream happy case tests pass |
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the HD pipeline to make ligand-receptor gene analysis configurable and updates container references to use SHA256 digests. It also standardizes function naming conventions to use snake_case.
- Adds configurable parameters for ligand-receptor gene filtering and gene expression thresholds
- Updates container references from tag-based to SHA256 digest-based pinning
- Refactors function names from camelCase to snake_case for consistency
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| nextflow/templates/hdPipeline.R | Refactored to use configurable parameters, moved ligand-receptor loading inside conditional block, and updated "good genes" logic to select top N genes |
| nextflow/nextflow.config | Added new parameters for ligand-receptor gene usage and gene threshold configuration |
| nextflow/main_hd.nf | Updated container to SHA256 digest and made LRscores output optional |
| nextflow/main.nf | Updated container to SHA256 digest and refactored function names to snake_case |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| saveRDS(receptor_scores, file = sprintf("%s/receptor_scores.rds", output_dir)) | ||
| saveRDS(lr_scores, file = sprintf("%s/LRscores.rds", output_dir)) | ||
| } else { | ||
| # Use top "good" genes |
There was a problem hiding this comment.
The comment 'Use top "good" genes' is misleading. It should be updated to 'Select top N genes by expression' or similar to accurately reflect that this selects the top genes by total expression rather than filtering by a threshold.
| # Use top "good" genes | |
| # Select top N genes by total expression |
| tag "$meta.id" | ||
| label 'process_high_memory' | ||
| container 'ghcr.io/deshpandelab/spacemarkers:main' | ||
| container 'ghcr.io/deshpandelab/spacemarkers@sha256:9c06f8f9340bb5c51300dbf3bc4e803613a15e1bd349eae43d5a129462a13f4e' |
There was a problem hiding this comment.
This is the only one I did not test. Is this the right container address? A named version would look nicer?
There was a problem hiding this comment.
would look nicer, but would provide less reproducibility guarantees, voting to keep as is
This pull request updates the
spacemarkerspipeline to improve reproducibility, flexibility, and gene selection logic. The most important changes include pinning all container images to a specific digest for reproducibility, updating function calls to use new snake_case naming conventions, and adding configurable parameters for ligand-receptor gene filtering and gene expression thresholds. Additionally, the logic for gene selection in the HD pipeline is now controlled by pipeline parameters.Reproducibility and Container Management:
spacemarkerscontainer in Nextflow processes are now pinned to a specific image digest (sha256:9c06f8f...) instead of using floating tags. This ensures consistent environments across runs. [1] [2] [3] [4] [5]Function Naming Consistency:
main.nfto use snake_case (e.g.,get_spatial_features,get_spatial_parameters,find_all_hotspots,calculate_overlap_undirected,get_pairwise_interacting_genes) for consistency with updated codebase conventions. [1] [2]Parameterization and Flexibility:
use_ligand_receptor_genes(boolean) andgood_gene_threshold(integer) tonextflow.configfor controlling gene filtering behavior in the HD pipeline.Gene Selection Logic in HD Pipeline:
hdPipeline.Rto use the new parameters: ifuse_ligand_receptor_genesis true, the analysis is restricted to ligand-receptor genes using the CellChat database; otherwise, the topgood_gene_thresholdmost highly expressed genes are selected for analysis. [1] [2] [3]Output Handling Improvements:
LRscoresoutput in theSPACEMARKERS_HDprocess is now marked as optional, reflecting that it may not always be produced depending on pipeline parameters.