Analyze single-cell DNA data synthesized by Instrument Free method in two-cell experiment of GM12878 and GM24385. If you need other cell lines, please contact caleb_thinh_tong@berkeley.edu.
- Input: a folder containing only
.fastq.gzfiles of R1 and R2 - Output: a .csv file matrix of barcodes vs. amplicon metrics
- These amplicon metrics include:
- Read counts aligning to each target amplicon (i.e., AML panel of MissionBio)
- Variants called selected by two-cell line experiment
- Quality metrics: total read counts, evenness (1- Gini coefficient), entropy, Jaccard index, etc.
You also need to install:
- Python 3 (and all associated packages like pandas, numpy, json, etc. See in the Jupyternotebook
- Clone Cyrille's barcoding bead demultiplex and filter: https://github.com/cdelley/scyBCB/
- bbmap (ensure there's a file called demuxbyname.sh): https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbmap-guide/
- GATK 4.1.3.0 (need this exact version): https://gatk.broadinstitute.org/hc/en-us/sections/360007279472-4-1-3-0
- bwa-mem2: https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.2.1/bwa-mem2-2.2.1_x64-linux.tar.bz2
- samtools: https://www.htslib.org/
- Processed .fa file and reference genome for alignment (link to Dropbox)