I am using episcanpy to build h5ad files from snATAC-Seq data. I use the following code in episcanpy v 0.4.0:
epi_peak = epi.ct.load_peaks( peaks_file )
ann = epi.ct.bld_mtx_bed( fragments_file,
feature_region = epi_peak,
chromosomes = list( epi_peak.keys() ),
thread = 6 )
ann.write_h5ad( outdir + samp + '_pruned.h5ad' )
within a snakemake pipeline where peaks_file is a sample specific macs2 file and the fragments_file is bed format output from cellranger-arc (chrom, start, end, cell_id/barcode).
During QC, I noticed a sequential chunk of peaks/features with 0 counts across all cells/barcodes. This impacted only a small amount of reads (<1% of the total read count) but resulted in peaks/features without any reads/fragments in those regions within the impacted samples. However I can see reads/fragments within the input fragments file, and the macs2 peaks were called using the same fragments file so this should not be happening. The result is not reproducible - subsequent runs do not skip the same region. I suspect that a thread is infrequently lost into the ethos resulting in 0 read/fragment counts for a small section of the genome.
I am using episcanpy to build h5ad files from snATAC-Seq data. I use the following code in episcanpy v 0.4.0:
epi_peak = epi.ct.load_peaks( peaks_file )
ann = epi.ct.bld_mtx_bed( fragments_file,
feature_region = epi_peak,
chromosomes = list( epi_peak.keys() ),
thread = 6 )
ann.write_h5ad( outdir + samp + '_pruned.h5ad' )
within a snakemake pipeline where peaks_file is a sample specific macs2 file and the fragments_file is bed format output from cellranger-arc (chrom, start, end, cell_id/barcode).
During QC, I noticed a sequential chunk of peaks/features with 0 counts across all cells/barcodes. This impacted only a small amount of reads (<1% of the total read count) but resulted in peaks/features without any reads/fragments in those regions within the impacted samples. However I can see reads/fragments within the input fragments file, and the macs2 peaks were called using the same fragments file so this should not be happening. The result is not reproducible - subsequent runs do not skip the same region. I suspect that a thread is infrequently lost into the ethos resulting in 0 read/fragment counts for a small section of the genome.