Shareable Project for: Decomposing Oncogenic Transcriptional Signatures to Generate Maps of Divergent Cellular States
Authors: William Kim(1), (Kwat) Huwate Yeerna(2), Taylor Cavazos(2), Kate Medetgul-Ernar(2), Clarence Mah(3), Stephanie Ting(2), Jason Park(2), Jill P. Mesirov(2,3), and Pablo Tamayo(2,3).
- Eli and Edythe Broad Institute
- UCSD Moores Cancer Center
- UCSD School of Medicine
Date: April 17, 2017
In this series of notebook chapters, we introduce Onco-GPS (OncoGenic Positioning System), a data-driven analysis framework and associated experimental and computational methodology that makes use of an oncogenic activation signature to identify multiple cellular states associated with oncogene activation. In this introduction we will describe the overall method and then we will provide a guide to the notebook chapters.
The Onco-GPS methodology decomposes an oncogenic activation signature into its constituent components in such way that the context dependencies and different modalities of oncogenic activation are made explicit and taken into account. Once characterized and annotated, these components are used to deconstruct and define cellular states, and to map individual samples onto a novel visual paradigm: a two-dimensional Onco-GPS “map.” This resulting model facilitates further molecular characterization and provides an effective analysis and summarization tool that can be applied to explore complex oncogenic states.
The Onco-GPS approach is executed in 3 major modular steps as shown in the Figure below.
Step I involves the experimental generation of a representative gene expression signature reflecting the activation of an oncogene of interest. In step II, the resulting signature is decomposed into a set of coherent transcriptional components using a large reference dataset that represents multiple cellular states relevant to the oncogene of interest. These components are also biologically annotated and characterized through further analysis and experimental validation (see article). In step III, a representative subset of samples and components are selected to define cellular states using a clustering procedure. The selected components are also used as transcriptional coordinates to generate a two-dimensional map where the selected individual samples are projected relative to these transcriptional coordinates in analogy to a geographical GPS system as shown below.
The Onco-GPS map can also be used to display the association of samples with various genomic features, such as genetic lesions, pathway activation, individual gene expression, genetic dependencies and drug sensitivities. We will use the Onco-GPS approach to explore the complex functional landscape of cancer cell lines with alterations in the RAS/MAPK pathway.
Chapter 1: Set up data
- This chapter describes how to download a password protected dataset, and prepares the data
data/
for the chapters to come.
Chapter 2: Generate oncogenic-activation signature
- This chapter generates the oncogenic signature. This is useful if one is interested in creating an Onco-GPS map for a given oncogene (for which one has a dataset or at least a gene set representing its activation).
Chapter 3: Decompose oncogenic-activation signature and define transcriptional components
- This chapter takes the oncogenic signature from chapter 1, or any other signature or gene set of interest, and decomposes it into transcriptional components using Non-Negative Matrix Factorization (NMF).
Chapter 4: Annotate transcriptional components
- This chapter annotates, or characterizes, the transcriptional components found in chapter 3 by matching many types of genomic features to the component profiles (i.e. the rows of the "H" matrix generated in chapter 3). The results produced are stored in
results/match_components
.
Chapter 5: Define cellular states and make Onco-GPS map
- This chapter defines the oncogenic states by clustering the KRAS mutant subset of the "H" matrix obtained in chapter 3. It also defines a triangular or ternary Onco-GPS map using components C1, C7 and C2, and then projects the KRAS mutant samples on it.
Chapter 6: Annotate cellular states
- This chapter annotates and characterizes the oncogenic states defined in chapter 5, similar to chapter 4 where the transcriptional components are annotated. The results produced are stored in
results/match_states
.
Chapter 7: Display genomic features on Onco-GPS map
- This chapter displays selected genomic features of interest on the KRAS mutants Onco-GPS map including gene, protein and pathway expression, mutations, tissue types etc.
Chapter 8: Define global cellular states and make global Onco-GPS map
- This chapter defines the global oncogenic states (S1-S15) and corresponding Onco-GPS map using all the KRAS components (C1-C9) defined in chapter 3.
Chapter 9: Display genomic features on global Onco-GPS map
- This chapter displays selected genomic features of interest on the global Onco-GPS map including gene, protein and pathway expression, mutations, tissue types etc.
-
Before trying to run any of the notebook chapters, follow these steps.
-
To reproduce the entire analysis, run the 9 notebook chapters in sequence.
-
To apply the methodology to a different oncogene, start by generating the oncogenic signature (chapter 2) using an appropriate dataset (e.g. one generated in your laboratory, one taken from the literature, or a relevant gene set).
-
To explore the original KRAS mutant or the global Onco-GPS presented in the article, e.g. display your favorite gene mRNA or mutations status, go directly to chapters 7 or 9 and modify these chapters to display the gene or feature of interest.
-
The analysis in most chapters will run in under a couple of hours. However, chapters 4 and 6 could take a few days of computer time because they execute full annotation sweeps using all components and all states against many datasets of genomic features.
Requirements:
Get the requirements:
In Terminal enter:
spro create -g https://github.com/UCSD-CCAL/onco_gps_paper_analysis onco_gps_paper_analysis
In Terminal enter:
cd onco-gps-paper-analysis
spro enter # starts project environment
spro install # installs project dependencies
spro download # downloads project data too large to store on Github
spro run notebook # opens Jupyter Notebook
Open code/
and start with notebook 1, 1 Set up data.ipynb
. Then run whatever notebooks interest you.
Note: every time you want to edit or run the notebooks, you'll need to run the commands shown above.
If something's not working or you have questions, comments, or concerns, please create an issue.