Cancer survival prediction and integration of multi-omics integration with Supervised Autoencoders, Stacked Autoencoders and Concrete Supervised Autoencoders for multiple correlated driver genes
Performance of multi-omics measurements and prediction for cancer survival integrating harmonized RNA sequencing from the ROSMAP cohort using supervised autoencoders with paralledged generative adversarial networks (GAN) based manifold omics analysis without priors. The most significant parameter during cancer therapy is survival analysis for revealing clinically significant biomarkers to stratify biological agents. Supervised encoders could facilitate accurate diagnosis of complex diseases and survival progression at multiple genetic levels. Multi-stage dimensionality based models may scale up time execution in comparison to state-of-the-art alternatives.
Breast cancer is the second most common cancer among women in the United States. Breast cancer is highly heterogeneous, composed of different subtypes, with different clinical, pathological, and molecular characteristics, as well as prognostic and therapeutic significance. Considering the significant variance in breast cancer outcomes, it is important to accurately predict the survival and prognosis of the breast cancer patients. Prediction of survival or prognosis can facilitate precision medicine of breast cancer. Many deep learning methods have been proposed to cancer prognosis prediction using genomic information, but most of them focus on a single-layer of omics data, where gene expression (mRNA) is most commonly used. Currently, there are some existing tools using autoencoders to intergrate multi-omics data for cancer prognosis prediction. Our team aims to compare these existing algorisms and advance and optimize the methods for better utilities [1].
[1] Chai, H., Zhou, X., Zhang, Z., Rao, J., Zhao, H., & Yang, Y. (2021). Integrating multi-omics data through deep learning for accurate cancer prognosis prediction. Computers in biology and medicine, 134, 104481.
The breast cancer datasets are publicly available on the Cancer Genome Atlas (TCGA) (https://tcga-data.nci.nih.gov/tcga/). The datasets could be downloaded through the R package “TCGAassembler 2” v2.0.6. The datasets contained four types of multi-omics data: mRNA, miRNA, DNA methylation, and copy number variation.
Feel free to take a look at the jupyter notebook and the instructions of forked repositories.
All the requirements for python scripts are located in requirements.txt
file.
Has been tested in Linux (ubuntu) and Windows 11.
Tools:
- Python
- Jupyter Notebooks
- R
- DeepProg package
- DCAP autoencoders
This project field the need for more robust cancer subtype diagnosis using deep learning methods such as DCAP (A framework to integrate multi-omics data by Denoising Autoencoder for Accurate cancer prognosis prediction) concluding mRNA performance better in comparisson to miRNA, methylation and following CNV (Copy Number Variation). These constructed models could distinguish high-risk patients from low-risk by identifying at the same time breast cancer related biomarkers. Room for performance and improvements still exists while this empirical results might be utilized to impact hitherto patients.
- Vasileios Alevizos | [email protected], [email protected]| Karolinska Institutet, iKnowHow | Team Leader
- Vanessa Xiao | [email protected] | MIT | Team Member
- Yishu Qu | [email protected] | Northwestern University | Team Member
- Alexus Acton | [email protected] | UAB | Team Member
- Zongliang Yue | [email protected] | UAB | Team Member