DoubletDetection is a Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices.
To install DoubletDetection:
git clone https://github.com/JonathanShor/DoubletDetection.git
cd DoubletDetection
pip3 install .
To run basic doublet classification:
import doubletdetection
clf = doubletdetection.BoostClassifier()
# raw_counts is a cells by genes count matrix
labels = clf.fit(raw_counts).predict()
raw_countsis a scRNA-seq count matrix (cells by genes), and is array-likelabelsis a 1-dimensional numpy ndarray with the value 1 representing a detected doublet, 0 a singlet, andnp.nanan ambiguous cell.
The classifier works best when
- There are several cell types present in the data
- It is applied individually to each run in an aggregated count matrix
In v2.5 we have added a new experimental clustering method (scanpy's Louvain clustering) that is much faster than phenograph. We are still validating results from this new clustering. Please see the notebook below for an example of using this new feature.
See our jupyter notebook for an example on 8k PBMCs from 10x.
Data can be downloaded from the 10x website.
bioRxiv submission and journal publication expected in the coming months. Please use the following for now:
Gayoso, Adam, & Shor, Jonathan. (2018, July 17). DoubletDetection (Version v2.4). Zenodo. http://doi.org/10.5281/zenodo.2678042
This project is licensed under the terms of the MIT license.