fix: migrate CHARLIE runtime activation to mamba-safe flow#149
Merged
Conversation
- replace conda activation strings with mamba-first environment setup in runtime paths\n- keep cluster-specific module loading behavior and reuse activation in module load command paths\n- simplify redirect wrapper delegation and stop masking runtime failures\n- use MAMBA_PREFIX in runtime activation environment\n\nRefs: #148\n\n⚡ generated using AI ⚡
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
This PR migrates the pipeline entrypoint/runtime activation to a Mamba-safe flow (removing Conda activation strings) and simplifies the wrapper dispatch. It also includes a broad formatting/whitespace pass across many workflow scripts and resource/config files.
Changes:
- Updated
charlieplatform initialization to construct a Mamba-first environment setup and removed Conda activation strings. - Simplified
bin/redirectto delegate directly tocharlieand stop masking failures. - Reformatted numerous workflow scripts/configs (Python/R/shell/YAML) for consistent indentation (with some scripts still containing logic issues that should be corrected before merge).
Reviewed changes
Copilot reviewed 54 out of 73 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| workflow/scripts/transcript2gene.py | Re-indented; function safety issue remains (uninitialized variable). |
| workflow/scripts/reformat_hg38_2_hg19.py | Re-indented/refactored prints. |
| workflow/scripts/merge_ReadsPerGene_counts.R | Whitespace-only cleanup. |
| workflow/scripts/merge_counts_tables_2_counts_matrix.py | Large formatting pass (argparse/options) without intended logic changes. |
| workflow/scripts/make_star_index.sh | Minor formatting (line alignment). |
| workflow/scripts/junctions2readids.py | Formatting + minor string literal normalization. |
| workflow/scripts/get_index_rl.py | Formatting only. |
| workflow/scripts/gather_cluster_stats.sh | Minor formatting only. |
| workflow/scripts/fix_refseq_gtf.py | Large formatting-only pass. |
| workflow/scripts/fix_gtfs.py | Formatting-only pass; preserves behavior. |
| workflow/scripts/filter_junction.py | Formatting-only pass. |
| workflow/scripts/filter_junction_human.py | Formatting-only pass. |
| workflow/scripts/filter_dcc.py | Formatting-only pass but exposes/retains filtering logic bugs. |
| workflow/scripts/filter_ciriout.py | Formatting-only pass but exposes/retains filtering logic bugs. |
| workflow/scripts/filter_bam.py | Formatting-only pass. |
| workflow/scripts/filter_bam_for_splice_reads.py | Formatting-only pass. |
| workflow/scripts/filter_bam_for_linear_reads.py | Formatting-only pass. |
| workflow/scripts/filter_bam_for_BSJs.py | Formatting-only pass. |
| workflow/scripts/filter_bam_by_readids.py | Formatting-only pass. |
| workflow/scripts/create_nclscan_per_sample_counts_table.py | Formatting-only pass but retains logic bugs (strand selection + annotation mapping). |
| workflow/scripts/create_mapsplice_per_sample_counts_table.py | Formatting-only pass. |
| workflow/scripts/create_dcc_per_sample_counts_table.py | Formatting-only pass. |
| workflow/scripts/Create_ciri_count_matrix.py | Formatting-only pass. |
| workflow/scripts/create_circExplorer_per_sample_counts_table.py | Formatting-only pass. |
| workflow/scripts/Create_circExplorer_count_matrix.py | Formatting-only pass. |
| workflow/scripts/Create_circExplorer_BSJ_count_matrix.py | Formatting-only pass. |
| workflow/scripts/circExplorer_get_annotated_counts_per_sample.py | Formatting-only pass. |
| workflow/scripts/bam_to_bigwig.sh | Minor formatting only. |
| workflow/scripts/bam_split_by_regions.py | Formatting-only pass. |
| workflow/scripts/bam_get_max_readlen.py | Formatting-only pass. |
| workflow/scripts/apply_junction_filters.py | Formatting-only pass. |
| workflow/scripts/annotate_clear_quant.py | Formatting-only pass. |
| workflow/scripts/_process_bamtobed.py | Formatting-only pass. |
| workflow/scripts/_multifasta2separatefastas.sh | Minor formatting only. |
| workflow/scripts/_merge_circExplorer_found_counts.py | Formatting-only pass. |
| workflow/scripts/_make_master_counts_table.py | Formatting-only pass (contains no-op set_index calls). |
| workflow/scripts/_filter_linear_spliced_readids_w_rid2jid.py | Formatting-only pass. |
| workflow/scripts/_compare_lists.py | Formatting-only pass. |
| workflow/scripts/_collapse_find_circ.py | Formatting-only pass. |
| workflow/scripts/_circExplorer_BSJ_get_strand.py | Formatting-only pass. |
| workflow/scripts/_bedpe2bed.py | Formatting-only pass. |
| workflow/scripts/_bedintersect_to_rid2jid.py | Formatting-only pass. |
| workflow/scripts/_bamtobed2readendsbed.py | Formatting-only pass. |
| workflow/scripts/_bam_get_alignment_stats.py | Formatting-only pass. |
| workflow/scripts/_bam_filter_BSJ_for_HQonly.py | Formatting-only pass. |
| workflow/scripts/_append_splice_site_flanks_to_BSJs.py | Formatting-only pass. |
| workflow/scripts/_add_geneid2genepred.py | Re-indented; function safety issue remains (uninitialized variable). |
| workflow/rules/preprocessing.smk | Removed stray whitespace lines. |
| workflow/envs/clear.yaml | YAML indentation normalization under pip:. |
| resources/TruSeq_and_nextera_adapters.consolidated.fa | Minor formatting (line alignment). |
| resources/NCLscan.config.template | Whitespace cleanup. |
| resources/merge_dataframes.R | Comment whitespace cleanup. |
| resources/dockers/ccbr_clear/Dockerfile | Comment whitespace cleanup. |
| resources/collapse_bed_by_names.py | Formatting-only pass. |
| resources/cluster.json.highmem | Trailing whitespace cleanup. |
| resources/argparse.bash | Removed trailing blank line. |
| docs/dryrun_example.txt | Minor formatting (line alignment). |
| docker/star_ucsc_cufflinks/environment.yml | YAML indentation normalization. |
| docker/dcc/environment.yml | YAML indentation normalization. |
| docker/cutadapt_fqfilter/environment.yml | YAML indentation normalization. |
| docker/circRNA_finder/environment.txt | Minor formatting (line alignment). |
| docker/bowtie1/environment.txt | Minor formatting (line alignment). |
| config/samples.tsv.fulltest | Minor formatting (line alignment). |
| charlie | Runtime activation migration to Mamba-safe PATH-based setup; quoting tweak for eval. |
| bin/redirect | Simplified wrapper to delegate directly to charlie and preserve exit codes. |
| .tests/lint_workdir/ref/dummy | Removed empty placeholder content (file deletion). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.start = int(l[1]) | ||
| self.end = int(l[2]) | ||
| self.nreads = int(l[4]) | ||
| self.size = self.start - self.end |
Comment on lines
191
to
+195
| if out.filter_out == False: | ||
| out.filter_by_size(host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max) | ||
| out.filter_by_size( | ||
| host_min=args.host_filter_min, | ||
| host_max=args.host_filter_max, | ||
| virus_min=args.virus_filter_min, |
| self.start = int(l[2]) - 1 | ||
| self.end = int(l[3]) | ||
| self.nreads = int(l[4]) | ||
| self.size = self.start - self.end |
Comment on lines
193
to
+197
| if out.filter_out == False: | ||
| out.filter_by_size(host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max) | ||
| out.filter_by_size( | ||
| host_min=args.host_filter_min, | ||
| host_max=args.host_filter_max, | ||
| virus_min=args.virus_filter_min, |
| "nclscan_annotation", | ||
| ] | ||
|
|
||
| minus_strand = resultsfile[resultsfile["strandd"] == "+"] |
Comment on lines
+192
to
+194
| outdf["nclscan_annotation"] = ( | ||
| outdf["nclscan_annotation"] + 1 | ||
| ) # change 1 to 2 and 0 to 1 ... as 0 is for no annotation |
Comment on lines
+4
to
+11
| def get_id(s, whatid): | ||
| s = s.split() | ||
| for i, j in enumerate(s): | ||
| if j == whatid: | ||
| r = s[i + 1] | ||
| r = r.replace('"', "") | ||
| r = r.replace(";", "") | ||
| return r |
Comment on lines
+4
to
+11
| def get_id(s, whatid): | ||
| s = s.split() | ||
| for i, j in enumerate(s): | ||
| if j == whatid: | ||
| r = s[i + 1] | ||
| r = r.replace('"', "") | ||
| r = r.replace(";", "") | ||
| return r |
Update hardcoded base paths from /gpfs/gsfs10/users/CCBR_Pipeliner to /vf/users/CCBR_Pipeliner in runtime/docs/config references to avoid missing-file failures in dryrun. Refs: #150 ⚡ generated using AI ⚡
…ible Use // placeholders in config template so CLI values are rendered during init. Refs: #151 ⚡ generated using AI ⚡
Stop defaulting to ERCC for init runs that do not pass --additives. Update help text to document blank default. Refs: #151 ⚡ generated using AI ⚡
Stop prepending ccr to PARTITION and submit with norm by default. Refs: #152 ⚡ generated using AI ⚡
Set PARTITION explicitly to norm inside biowulf platform block so submission always targets a valid default partition. Refs: #152 ⚡ generated using AI ⚡
The ccr partition does not exist on the biowulf system. Only the norm partition should be used as the default for all rules. ⚡ generated using AI ⚡
Add timestamped progress messages at major create_index checkpoints to improve runtime visibility and troubleshooting for long index jobs. ⚡ generated using AI ⚡
- In main(), check $SIFCACHE (set by module load ccbrpipeliner) before falling back to /data/$USER/.singularity or $WORKDIR/.singularity - Add --singularity-prefix "$SING_CACHE_DIR" to all three snakemake invocations (local, slurm, dry-run/unlock) so Snakemake looks for pre-built SIFs in the cache directory without pulling from Docker Hub Closes #154 ⚡ generated using AI ⚡
Add a commentary block in run() for all modes (local, slurm, dryrun/unlock) that prints key variables to the terminal before the pipeline starts: - Working directory, Snakefile, config file - Singularity cache and bind paths - Platform, cluster profile, partition (where applicable) ⚡ generated using AI ⚡
unlock() was the only runmode that did not call set_singularity_binds before entering run(), leaving SINGULARITY_BINDS empty in the commentary block and in --singularity-args. ⚡ generated using AI ⚡
dryrun() and touch() were also missing the set_singularity_binds call, leaving SINGULARITY_BINDS empty. ⚡ generated using AI ⚡
Replace the flat Python-only bind path logic with a two-layer approach: 1. Walk up parents of WORKDIR and PIPELINE_HOME; if /data/<name> is a symlink resolving to that ancestor, include both the /data/<name> symlink path and the real path (e.g. /data/Ziegelbauer_lab and /vf/users/Ziegelbauer_lab, /data/CCBR_Pipeliner and /vf/users/CCBR_Pipeliner). 2. Keep Python-derived paths only when they are not ancestors (too broad) or descendants (too specific) of the bash-derived explicit paths. Result for a typical biowulf workdir: -B /lscratch,/data/Ziegelbauer_lab,/vf/users/Ziegelbauer_lab,/data/CCBR_Pipeliner,/vf/users/CCBR_Pipeliner ⚡ generated using AI ⚡
⚡ generated using AI ⚡
⚡ generated using AI ⚡
⚡ generated using AI ⚡
Comment on lines
+180
to
+181
| minus_strand = resultsfile[resultsfile["strandd"] == "+"] | ||
| minus_strand = minus_strand[["chrd", "coordd", "coorda", "strandd", "reads", "case"]] |
Comment on lines
+68
to
+72
| if not attributes["gene_id"] in gene_id_2_gene_name: | ||
| if "gene_name" in attributes: | ||
| gene_id_2_gene_name[attributes["gene_id"]] = attributes["gene_name"] | ||
| else: | ||
| gene_id_2_gene_name["gene_id"] = attributes["gene_id"] |
Comment on lines
+30
to
+34
| self.start = int(l[1]) | ||
| self.end = int(l[2]) | ||
| self.nreads = int(l[4]) | ||
| self.size = self.start - self.end | ||
| self.filter_out = False |
Comment on lines
187
to
200
| for l in alllines: | ||
| out = DCC(entry=l) | ||
| out.set_host_additive_virus(regions=regions) | ||
| out.filter_by_nreads(args.back_spliced_min_reads) | ||
| if out.filter_out == False: | ||
| out.filter_by_size(host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max) | ||
| out.filter_by_size( | ||
| host_min=args.host_filter_min, | ||
| host_max=args.host_filter_max, | ||
| virus_min=args.virus_filter_min, | ||
| virus_max=args.virus_filter_max, | ||
| ) | ||
| if out.filter_out == True: | ||
| outfile.write(l) | ||
| outfile.close() |
Comment on lines
+36
to
+41
| self.chrom = l[1] | ||
| self.start = int(l[2]) - 1 | ||
| self.end = int(l[3]) | ||
| self.nreads = int(l[4]) | ||
| self.size = self.start - self.end | ||
| self.filter_out = False |
Comment on lines
189
to
202
| for l in alllines: | ||
| out = CIRIOUT(entry=l) | ||
| out.set_host_additive_virus(regions=regions) | ||
| out.filter_by_nreads(args.back_spliced_min_reads) | ||
| if out.filter_out == False: | ||
| out.filter_by_size(host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max) | ||
| out.filter_by_size( | ||
| host_min=args.host_filter_min, | ||
| host_max=args.host_filter_max, | ||
| virus_min=args.virus_filter_min, | ||
| virus_max=args.virus_filter_max, | ||
| ) | ||
| if out.filter_out == True: | ||
| outfile.write(l) | ||
| outfile.close() |
Comment on lines
+407
to
+412
| # Step 4: combine -- EXTRA (platform scratch) + explicit + filtered Python | ||
| all_binds="${EXTRA_SINGULARITY_BINDS}" | ||
| [[ -n "$explicit_binds" ]] && all_binds="${all_binds},${explicit_binds}" | ||
| [[ -n "$filtered_python" ]] && all_binds="${all_binds},${filtered_python}" | ||
|
|
||
| SINGULARITY_BINDS="-B ${all_binds}" |
Comment on lines
+4
to
+11
| def get_id(s, whatid): | ||
| s = s.split() | ||
| for i, j in enumerate(s): | ||
| if j == whatid: | ||
| r = s[i + 1] | ||
| r = r.replace('"', "") | ||
| r = r.replace(";", "") | ||
| return r |
Comment on lines
+4
to
+11
| def get_id(s, whatid): | ||
| s = s.split() | ||
| for i, j in enumerate(s): | ||
| if j == whatid: | ||
| r = s[i + 1] | ||
| r = r.replace('"', "") | ||
| r = r.replace(";", "") | ||
| return r |
…BINDS is empty On unknown/non-HPC platforms EXTRA_SINGULARITY_BINDS is empty, causing the previous concatenation to produce "-B ,/path" which is invalid. Build all_binds incrementally using the same :+ guard used elsewhere. ⚡ generated using AI ⚡
- Document SIFCACHE auto-detection from module load ccbrpipeliner - Note that --additives and --viruses are optional (default: blank) enabling host-only circRNA discovery mode - Fix stale version in tutorial --help output (v0.10.0-dev -> v0.13.0) ⚡ generated using AI ⚡
bam_split_by_regions.py: - make --additives and --viruses optional (default: empty string) - handle empty CSV safely via _split_csv() helper - collect all unknown regions before exiting with a descriptive error _create_circExplorer_linear_bam.v2.sh: - only pass --additives/--viruses to bam_split_by_regions.py when non-empty, enabling host-only runs without crashing the script - use array-based write_split_cmd() for cleaner argument handling Snakefile (on_finish_cmd): - add module load ccbrpipeliner so jobby/spooker are available - fix jobby invocation: pipe snakemake.log via stdin - update command name run_jobby_on_snakemake_log -> jobby - add set -exo pipefail and file existence checks for diagnostics ⚡ generated using AI ⚡
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR updates CHARLIE runtime activation paths to remove Conda-specific activation strings and use a Mamba-safe environment setup flow.
Changes
charliewith a Mamba-first command pathCONDA_PREFIXtoMAMBA_PREFIXbin/redirectto delegate directly to main script and stop masking failuresValidation
charlie --helpexits cleanly with no shell-init/libmamba warningscharlie -m=dryrunno longer fails on old conda activation path usageconda activate/conda.shreferences remain in targeted runtime filesCloses #148