Skip to content

fix: migrate CHARLIE runtime activation to mamba-safe flow#149

Merged
kopardev merged 20 commits into
mainfrom
fix/issue-147-mamba-init
Jun 12, 2026
Merged

fix: migrate CHARLIE runtime activation to mamba-safe flow#149
kopardev merged 20 commits into
mainfrom
fix/issue-147-mamba-init

Conversation

@kopardev

@kopardev kopardev commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR updates CHARLIE runtime activation paths to remove Conda-specific activation strings and use a Mamba-safe environment setup flow.

Changes

  • replaced runtime activation construction in charlie with a Mamba-first command path
  • preserved platform-specific module loading behavior (biowulf/fnlcr)
  • switched runtime env marker from CONDA_PREFIX to MAMBA_PREFIX
  • simplified bin/redirect to delegate directly to main script and stop masking failures

Validation

  • charlie --help exits cleanly with no shell-init/libmamba warnings
  • charlie -m=dryrun no longer fails on old conda activation path usage
  • no conda activate/conda.sh references remain in targeted runtime files

Closes #148

kopardev and others added 2 commits June 5, 2026 17:32
- replace conda activation strings with mamba-first environment setup in runtime paths\n- keep cluster-specific module loading behavior and reuse activation in module load command paths\n- simplify redirect wrapper delegation and stop masking runtime failures\n- use MAMBA_PREFIX in runtime activation environment\n\nRefs: #148\n\n⚡ generated using AI ⚡

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the pipeline entrypoint/runtime activation to a Mamba-safe flow (removing Conda activation strings) and simplifies the wrapper dispatch. It also includes a broad formatting/whitespace pass across many workflow scripts and resource/config files.

Changes:

  • Updated charlie platform initialization to construct a Mamba-first environment setup and removed Conda activation strings.
  • Simplified bin/redirect to delegate directly to charlie and stop masking failures.
  • Reformatted numerous workflow scripts/configs (Python/R/shell/YAML) for consistent indentation (with some scripts still containing logic issues that should be corrected before merge).

Reviewed changes

Copilot reviewed 54 out of 73 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
workflow/scripts/transcript2gene.py Re-indented; function safety issue remains (uninitialized variable).
workflow/scripts/reformat_hg38_2_hg19.py Re-indented/refactored prints.
workflow/scripts/merge_ReadsPerGene_counts.R Whitespace-only cleanup.
workflow/scripts/merge_counts_tables_2_counts_matrix.py Large formatting pass (argparse/options) without intended logic changes.
workflow/scripts/make_star_index.sh Minor formatting (line alignment).
workflow/scripts/junctions2readids.py Formatting + minor string literal normalization.
workflow/scripts/get_index_rl.py Formatting only.
workflow/scripts/gather_cluster_stats.sh Minor formatting only.
workflow/scripts/fix_refseq_gtf.py Large formatting-only pass.
workflow/scripts/fix_gtfs.py Formatting-only pass; preserves behavior.
workflow/scripts/filter_junction.py Formatting-only pass.
workflow/scripts/filter_junction_human.py Formatting-only pass.
workflow/scripts/filter_dcc.py Formatting-only pass but exposes/retains filtering logic bugs.
workflow/scripts/filter_ciriout.py Formatting-only pass but exposes/retains filtering logic bugs.
workflow/scripts/filter_bam.py Formatting-only pass.
workflow/scripts/filter_bam_for_splice_reads.py Formatting-only pass.
workflow/scripts/filter_bam_for_linear_reads.py Formatting-only pass.
workflow/scripts/filter_bam_for_BSJs.py Formatting-only pass.
workflow/scripts/filter_bam_by_readids.py Formatting-only pass.
workflow/scripts/create_nclscan_per_sample_counts_table.py Formatting-only pass but retains logic bugs (strand selection + annotation mapping).
workflow/scripts/create_mapsplice_per_sample_counts_table.py Formatting-only pass.
workflow/scripts/create_dcc_per_sample_counts_table.py Formatting-only pass.
workflow/scripts/Create_ciri_count_matrix.py Formatting-only pass.
workflow/scripts/create_circExplorer_per_sample_counts_table.py Formatting-only pass.
workflow/scripts/Create_circExplorer_count_matrix.py Formatting-only pass.
workflow/scripts/Create_circExplorer_BSJ_count_matrix.py Formatting-only pass.
workflow/scripts/circExplorer_get_annotated_counts_per_sample.py Formatting-only pass.
workflow/scripts/bam_to_bigwig.sh Minor formatting only.
workflow/scripts/bam_split_by_regions.py Formatting-only pass.
workflow/scripts/bam_get_max_readlen.py Formatting-only pass.
workflow/scripts/apply_junction_filters.py Formatting-only pass.
workflow/scripts/annotate_clear_quant.py Formatting-only pass.
workflow/scripts/_process_bamtobed.py Formatting-only pass.
workflow/scripts/_multifasta2separatefastas.sh Minor formatting only.
workflow/scripts/_merge_circExplorer_found_counts.py Formatting-only pass.
workflow/scripts/_make_master_counts_table.py Formatting-only pass (contains no-op set_index calls).
workflow/scripts/_filter_linear_spliced_readids_w_rid2jid.py Formatting-only pass.
workflow/scripts/_compare_lists.py Formatting-only pass.
workflow/scripts/_collapse_find_circ.py Formatting-only pass.
workflow/scripts/_circExplorer_BSJ_get_strand.py Formatting-only pass.
workflow/scripts/_bedpe2bed.py Formatting-only pass.
workflow/scripts/_bedintersect_to_rid2jid.py Formatting-only pass.
workflow/scripts/_bamtobed2readendsbed.py Formatting-only pass.
workflow/scripts/_bam_get_alignment_stats.py Formatting-only pass.
workflow/scripts/_bam_filter_BSJ_for_HQonly.py Formatting-only pass.
workflow/scripts/_append_splice_site_flanks_to_BSJs.py Formatting-only pass.
workflow/scripts/_add_geneid2genepred.py Re-indented; function safety issue remains (uninitialized variable).
workflow/rules/preprocessing.smk Removed stray whitespace lines.
workflow/envs/clear.yaml YAML indentation normalization under pip:.
resources/TruSeq_and_nextera_adapters.consolidated.fa Minor formatting (line alignment).
resources/NCLscan.config.template Whitespace cleanup.
resources/merge_dataframes.R Comment whitespace cleanup.
resources/dockers/ccbr_clear/Dockerfile Comment whitespace cleanup.
resources/collapse_bed_by_names.py Formatting-only pass.
resources/cluster.json.highmem Trailing whitespace cleanup.
resources/argparse.bash Removed trailing blank line.
docs/dryrun_example.txt Minor formatting (line alignment).
docker/star_ucsc_cufflinks/environment.yml YAML indentation normalization.
docker/dcc/environment.yml YAML indentation normalization.
docker/cutadapt_fqfilter/environment.yml YAML indentation normalization.
docker/circRNA_finder/environment.txt Minor formatting (line alignment).
docker/bowtie1/environment.txt Minor formatting (line alignment).
config/samples.tsv.fulltest Minor formatting (line alignment).
charlie Runtime activation migration to Mamba-safe PATH-based setup; quoting tweak for eval.
bin/redirect Simplified wrapper to delegate directly to charlie and preserve exit codes.
.tests/lint_workdir/ref/dummy Removed empty placeholder content (file deletion).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread charlie Outdated
self.start = int(l[1])
self.end = int(l[2])
self.nreads = int(l[4])
self.size = self.start - self.end
Comment on lines 191 to +195
if out.filter_out == False:
out.filter_by_size(host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max)
out.filter_by_size(
host_min=args.host_filter_min,
host_max=args.host_filter_max,
virus_min=args.virus_filter_min,
self.start = int(l[2]) - 1
self.end = int(l[3])
self.nreads = int(l[4])
self.size = self.start - self.end
Comment on lines 193 to +197
if out.filter_out == False:
out.filter_by_size(host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max)
out.filter_by_size(
host_min=args.host_filter_min,
host_max=args.host_filter_max,
virus_min=args.virus_filter_min,
"nclscan_annotation",
]

minus_strand = resultsfile[resultsfile["strandd"] == "+"]
Comment on lines +192 to +194
outdf["nclscan_annotation"] = (
outdf["nclscan_annotation"] + 1
) # change 1 to 2 and 0 to 1 ... as 0 is for no annotation
Comment on lines +4 to +11
def get_id(s, whatid):
s = s.split()
for i, j in enumerate(s):
if j == whatid:
r = s[i + 1]
r = r.replace('"', "")
r = r.replace(";", "")
return r
Comment on lines +4 to +11
def get_id(s, whatid):
s = s.split()
for i, j in enumerate(s):
if j == whatid:
r = s[i + 1]
r = r.replace('"', "")
r = r.replace(";", "")
return r
kopardev added 15 commits June 5, 2026 18:13
Update hardcoded base paths from /gpfs/gsfs10/users/CCBR_Pipeliner to /vf/users/CCBR_Pipeliner in runtime/docs/config references to avoid missing-file failures in dryrun.

Refs: #150

⚡ generated using AI ⚡
…ible

Use // placeholders in config template so CLI values are rendered during init.

Refs: #151

⚡ generated using AI ⚡
Stop defaulting to ERCC for init runs that do not pass --additives.
Update help text to document blank default.

Refs: #151

⚡ generated using AI ⚡
Stop prepending ccr to PARTITION and submit with norm by default.

Refs: #152

⚡ generated using AI ⚡
Set PARTITION explicitly to norm inside biowulf platform block so submission always targets a valid default partition.

Refs: #152

⚡ generated using AI ⚡
The ccr partition does not exist on the biowulf system. Only the norm
partition should be used as the default for all rules.

⚡ generated using AI ⚡
Add timestamped progress messages at major create_index checkpoints to improve runtime visibility and troubleshooting for long index jobs.

⚡ generated using AI ⚡
- In main(), check $SIFCACHE (set by module load ccbrpipeliner) before
  falling back to /data/$USER/.singularity or $WORKDIR/.singularity
- Add --singularity-prefix "$SING_CACHE_DIR" to all three snakemake
  invocations (local, slurm, dry-run/unlock) so Snakemake looks for
  pre-built SIFs in the cache directory without pulling from Docker Hub

Closes #154

⚡ generated using AI ⚡
Add a commentary block in run() for all modes (local, slurm,
dryrun/unlock) that prints key variables to the terminal before
the pipeline starts:
- Working directory, Snakefile, config file
- Singularity cache and bind paths
- Platform, cluster profile, partition (where applicable)

⚡ generated using AI ⚡
unlock() was the only runmode that did not call set_singularity_binds
before entering run(), leaving SINGULARITY_BINDS empty in the
commentary block and in --singularity-args.

⚡ generated using AI ⚡
dryrun() and touch() were also missing the set_singularity_binds
call, leaving SINGULARITY_BINDS empty.

⚡ generated using AI ⚡
Replace the flat Python-only bind path logic with a two-layer approach:

1. Walk up parents of WORKDIR and PIPELINE_HOME; if /data/<name> is a
   symlink resolving to that ancestor, include both the /data/<name>
   symlink path and the real path (e.g. /data/Ziegelbauer_lab and
   /vf/users/Ziegelbauer_lab, /data/CCBR_Pipeliner and
   /vf/users/CCBR_Pipeliner).

2. Keep Python-derived paths only when they are not ancestors (too broad)
   or descendants (too specific) of the bash-derived explicit paths.

Result for a typical biowulf workdir:
  -B /lscratch,/data/Ziegelbauer_lab,/vf/users/Ziegelbauer_lab,/data/CCBR_Pipeliner,/vf/users/CCBR_Pipeliner

⚡ generated using AI ⚡
⚡ generated using AI ⚡

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 62 out of 81 changed files in this pull request and generated 9 comments.

Comment on lines +180 to +181
minus_strand = resultsfile[resultsfile["strandd"] == "+"]
minus_strand = minus_strand[["chrd", "coordd", "coorda", "strandd", "reads", "case"]]
Comment on lines +68 to +72
if not attributes["gene_id"] in gene_id_2_gene_name:
if "gene_name" in attributes:
gene_id_2_gene_name[attributes["gene_id"]] = attributes["gene_name"]
else:
gene_id_2_gene_name["gene_id"] = attributes["gene_id"]
Comment on lines +30 to +34
self.start = int(l[1])
self.end = int(l[2])
self.nreads = int(l[4])
self.size = self.start - self.end
self.filter_out = False
Comment on lines 187 to 200
for l in alllines:
out = DCC(entry=l)
out.set_host_additive_virus(regions=regions)
out.filter_by_nreads(args.back_spliced_min_reads)
if out.filter_out == False:
out.filter_by_size(host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max)
out.filter_by_size(
host_min=args.host_filter_min,
host_max=args.host_filter_max,
virus_min=args.virus_filter_min,
virus_max=args.virus_filter_max,
)
if out.filter_out == True:
outfile.write(l)
outfile.close()
Comment on lines +36 to +41
self.chrom = l[1]
self.start = int(l[2]) - 1
self.end = int(l[3])
self.nreads = int(l[4])
self.size = self.start - self.end
self.filter_out = False
Comment on lines 189 to 202
for l in alllines:
out = CIRIOUT(entry=l)
out.set_host_additive_virus(regions=regions)
out.filter_by_nreads(args.back_spliced_min_reads)
if out.filter_out == False:
out.filter_by_size(host_min=args.host_filter_min,host_max=args.host_filter_max,virus_min=args.virus_filter_min,virus_max=args.virus_filter_max)
out.filter_by_size(
host_min=args.host_filter_min,
host_max=args.host_filter_max,
virus_min=args.virus_filter_min,
virus_max=args.virus_filter_max,
)
if out.filter_out == True:
outfile.write(l)
outfile.close()
Comment thread charlie
Comment on lines +407 to +412
# Step 4: combine -- EXTRA (platform scratch) + explicit + filtered Python
all_binds="${EXTRA_SINGULARITY_BINDS}"
[[ -n "$explicit_binds" ]] && all_binds="${all_binds},${explicit_binds}"
[[ -n "$filtered_python" ]] && all_binds="${all_binds},${filtered_python}"

SINGULARITY_BINDS="-B ${all_binds}"
Comment on lines +4 to +11
def get_id(s, whatid):
s = s.split()
for i, j in enumerate(s):
if j == whatid:
r = s[i + 1]
r = r.replace('"', "")
r = r.replace(";", "")
return r
Comment on lines +4 to +11
def get_id(s, whatid):
s = s.split()
for i, j in enumerate(s):
if j == whatid:
r = s[i + 1]
r = r.replace('"', "")
r = r.replace(";", "")
return r
kopardev added 3 commits June 12, 2026 12:21
…BINDS is empty

On unknown/non-HPC platforms EXTRA_SINGULARITY_BINDS is empty, causing
the previous concatenation to produce "-B ,/path" which is invalid.
Build all_binds incrementally using the same :+ guard used elsewhere.

⚡ generated using AI ⚡
- Document SIFCACHE auto-detection from module load ccbrpipeliner
- Note that --additives and --viruses are optional (default: blank)
  enabling host-only circRNA discovery mode
- Fix stale version in tutorial --help output (v0.10.0-dev -> v0.13.0)

⚡ generated using AI ⚡
bam_split_by_regions.py:
- make --additives and --viruses optional (default: empty string)
- handle empty CSV safely via _split_csv() helper
- collect all unknown regions before exiting with a descriptive error

_create_circExplorer_linear_bam.v2.sh:
- only pass --additives/--viruses to bam_split_by_regions.py when
  non-empty, enabling host-only runs without crashing the script
- use array-based write_split_cmd() for cleaner argument handling

Snakefile (on_finish_cmd):
- add module load ccbrpipeliner so jobby/spooker are available
- fix jobby invocation: pipe snakemake.log via stdin
- update command name run_jobby_on_snakemake_log -> jobby
- add set -exo pipefail and file existence checks for diagnostics

⚡ generated using AI ⚡
@kopardev kopardev merged commit d0d1f7c into main Jun 12, 2026
4 of 5 checks passed
@kopardev kopardev deleted the fix/issue-147-mamba-init branch June 12, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate CHARLIE runtime activation from Conda strings to Mamba-safe environment setup

2 participants