Snakemake workflow: `single-cell-preprocess`

A Snakemake workflow for preprocessing multiplexed 10X single-cell RNA-seq data: empty droplet removal, HTO-based sample demultiplexing, doublet detection, QC filtering, and quality metric reporting.

Snakemake workflow: single-cell-preprocess
- Overview
- Input data
- Output
- Usage
- Deployment options
- Authors
- References

Overview

The workflow is built using Snakemake and consists of the following steps:

Empty droplet removal — DropletUtils::emptyDrops() distinguishes cell-containing droplets from empty droplets using the raw Cell Ranger count matrix, retaining barcodes that pass the configured FDR threshold.
Sample demultiplexing — cellhashR runs up to six HTO-based demultiplexing algorithms (HTODemux, MultiSeqDemux, DropletUtils, GMM-Demux, BFF-raw, BFF-cluster) and calls a consensus singlet assignment per barcode by majority vote.
Doublet detection — scDblFinder simulates artificial doublets in PCA space and classifies each droplet as a singlet or doublet using a random forest classifier.
Conventional QC filtering — Per-sample MAD-based outlier removal on library size, feature count, and mitochondrial fraction using scuttle::isOutlier().
Per-sample QC metrics — Computes and plots summary statistics (median, mean, SD, min, max) per sample or grouping variable.
Aggregate QC — Aggregates QC metrics across all samples, normalises values to MADs, and renders cross-sample heatmaps for a global quality overview.

Detailed information about input data and workflow configuration can be found in the config/README.md.

Input data

The workflow expects Cell Ranger multi pipeline outputs. Samples are auto-detected as subdirectories of data/cellranger/.

Input	Path	Notes
Raw feature-barcode matrix	`data/cellranger/{sample}/outs/multi/count/raw_feature_bc_matrix.h5`	Must contain both Gene Expression and Multiplexing Capture (HTO) libraries
HTO-to-sample mapping	`results/process_droplets/hto_to_sample_mapping/{sample}/hto_to_sample_mapping.tsv`	Tab-separated; columns: `hto_id`, `sample_name`

Output

All outputs are written to results/process_droplets_pipeline/{CONFIG_FILENAME}/, where CONFIG_FILENAME is set in workflow/Snakefile (default: config). Change this variable to namespace outputs from different config runs.

Directory	Key output files
`empty/{sample}/`	`whitelist.txt`, `blacklist.txt`, `output.qs`, `plots.pdf`, `session_info.txt`
`dehash/{sample}/`	`whitelist.txt`, `barcode_metadata.csv`, `metrics.csv`, `output.csv`, `plots.pdf`, `session_info.txt`
`doublet/{sample}/`	`whitelist.txt`, `barcode_metadata.csv`, `output.qs`, `plots.pdf`, `session_info.txt`
`filter/{sample}/`	`whitelist.txt`, `barcode_metadata.csv`, `plots.pdf`, `session_info.txt`
`qc_sc_sample/{sample}/`	`metrics.csv`, `plots.pdf`
`qc_sc_aggregate/`	`metrics.csv`, `heatmaps.pdf`

whitelist.txt files at each step carry the set of high-quality barcodes surviving that step; barcode_metadata.csv files carry per-cell annotations accumulated across steps.

Usage

The usage of this workflow is described in the Snakemake Workflow Catalog.

If you use this workflow in a paper, please cite the repository URL or its DOI and the tools listed in the References section.

Deployment options

Change to the workflow directory and adjust options in config/config.yml.

cd path/to/single-cell-preprocess

Perform a dry run to check the workflow before execution:

snakemake --dry-run

Run with test files using conda:

snakemake --cores 2 --sdm conda --directory .test

Run with apptainer / singularity:

snakemake --cores 2 --sdm conda apptainer --directory .test

Run on an HPC cluster via SLURM (recommended for production):

# Load required modules first
module load R/4.3.2-gfbf-2023a

sbatch -J process_droplets_pipeline -p short,long \
  --mem=80G --cpus-per-task=4 \
  --output=%x.log.out --error=%x.log.err \
  --wrap="snakemake -s Snakefile --cores 4 --rerun-incomplete"

Authors

Liezel Tamon
- University of Oxford
- ORCID profile

References

Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. Sustainable data analysis with Snakemake. F1000Research, 10:33, 2021. https://doi.org/10.12688/f1000research.29032.2

Lun, A. T. L., Riesenfeld, S., Andrews, T., Dao, T. P., Gomes, T., & Marioni, J. C. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biology, 20:63, 2019. https://doi.org/10.1186/s13059-019-1662-y

Bimber Lab. cellhashR: A Package for Demultiplexing Cell Hashing Data. R package version 1.2.1, 2026. https://github.com/BimberLab/cellhashR

Germain, P.-L., Lun, A., Garcia Meixide, C., Macnair, W., & Robinson, M. D. Doublet identification in single-cell sequencing data using scDblFinder. F1000Research, 10:979, 2021. https://doi.org/10.12688/f1000research.73600.2

McCarthy, D. J., Campbell, K. R., Lun, A. T. L., & Willis, Q. F. Scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R. Bioinformatics, 33(8), 1179–1186, 2017. https://doi.org/10.1093/bioinformatics/btw777

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
.test/config		.test/config
config		config
workflow		workflow
.gitignore		.gitignore
.snakemake-workflow-catalog.yml		.snakemake-workflow-catalog.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake workflow: `single-cell-preprocess`

Overview

Input data

Output

Usage

Deployment options

Authors

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Snakemake workflow: single-cell-preprocess

Overview

Input data

Output

Usage

Deployment options

Authors

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Snakemake workflow: `single-cell-preprocess`

Packages