By: Paul Doan (Google Scholar)
-
A straight-forward Python package for wet-lab biologists to analyze pooled, functional genetic perturbation screens (e.g., CRISPR, RNAi, ORF) coupled with either viability or FACS-based readouts.
-
The pre-processing and QC steps of this workflow below are duplicated from the Broad GPP's
poola
Python package. Please check it out here. -
The main difference in this
pooled
workflow is in the p-value calculation step.pooled
determines the p-value for each element (sgRNA, shRNA, or ORF barcode) by comparing its normalized log2-fold change (LFC) gene-level rank against a randomly-permuted null distribution of gene-level ranks, whereaspoola
's Method 1 scales the LFC to a Gaussian distribution. This approach was previously implemented inR
by Dr. Mikolaj Slabicki from Dr. Benjamin Ebert's lab @ Dana-Farber Cancer Institute (Slabicki and Kozicka et al., 2020). Hence, the demo data below is from the paper's genome-wide FACS-based CRISPR-Cas9 screen in HEK293T cell line engineered to express a GFP/mCherry reporter of CCNK a.k.a. cyclin K protein's stabiblity (Fig. 2G). Special thanks to Dr. Slabicki whom generously made the original code and data available in the Method section. -
Special thanks to Dr. John Doench, Peter DeWeirdt, and Dr. Mudra Hegde (co-authors of
poola
) and the team at GPP for their commitment to open-source science. More extensive wet- and dry-lab GPP resources are available here.
pip install pooled
Current version: 0.0.2
Dependencies:
- pandas
- numpy
- statsmodels
- matplotlib
- gpplot
- adjustText
To see a worked example and all functionalities, please launch the Jupyter notebook using this button below:
- How to go from raw FASTQ to read count table
- Genetic Interactions (double Cas9 or Cas12a multi-gRNA)
My knowledge of the design, execution and analyses of CRISPR screens came from my time working with my former post-doc mentor Dr. Sandor Spisak, PhD and my PI Dr. Nilay Sethi, MD, PhD at Dana-Farber Cancer Institute/HMS. This package is ultimately my trying to give back to the highly collaborative scientific and biomedical community I find myself in Boston.
I would also recommend that you check out commonly used, well-cited, published tools such as MAGeCK, BAGEL, DrugZ, casTLE, CRISPhieRmix, CRISPRBetaBinomial, MAUDE and many more at this link. This package is simply my attempt at working through the pooled, genetic screen analysis from scratch.
Please leave them in the Issues
section of this Github Repo. I'll try to compile some frequently asked questions here as they come. Highly encourage you to think about the data during the planning phase as that will greatly inform your screen design (library size, sequencing parameters, treatment time, number of treatment arms, minimum cell coverage, etc.)