Cancer survival prediction and integration of multi-omics integration with Supervised Autoencoders, Stacked Autoencoders and Concrete Supervised Autoencoders for multiple correlated driver genes

Performance of multi-omics measurements and prediction for cancer survival integrating harmonized RNA sequencing from the ROSMAP cohort using supervised autoencoders with paralledged generative adversarial networks (GAN) based manifold omics analysis without priors. The most significant parameter during cancer therapy is survival analysis for revealing clinically significant biomarkers to stratify biological agents. Supervised encoders could facilitate accurate diagnosis of complex diseases and survival progression at multiple genetic levels. Multi-stage dimensionality based models may scale up time execution in comparison to state-of-the-art alternatives.

Background

Breast cancer is the second most common cancer among women in the United States. Breast cancer is highly heterogeneous, composed of different subtypes, with different clinical, pathological, and molecular characteristics, as well as prognostic and therapeutic significance. Considering the significant variance in breast cancer outcomes, it is important to accurately predict the survival and prognosis of the breast cancer patients. Prediction of survival or prognosis can facilitate precision medicine of breast cancer. Many deep learning methods have been proposed to cancer prognosis prediction using genomic information, but most of them focus on a single-layer of omics data, where gene expression (mRNA) is most commonly used. Currently, there are some existing tools using autoencoders to intergrate multi-omics data for cancer prognosis prediction. Our team aims to compare these existing algorisms and advance and optimize the methods for better utilities [1].

[1] Chai, H., Zhou, X., Zhang, Z., Rao, J., Zhao, H., & Yang, Y. (2021). Integrating multi-omics data through deep learning for accurate cancer prognosis prediction. Computers in biology and medicine, 134, 104481.

Framework and workflow

Data

The breast cancer datasets are publicly available on the Cancer Genome Atlas (TCGA) (https://tcga-data.nci.nih.gov/tcga/). The datasets could be downloaded through the R package “TCGAassembler 2” v2.0.6. The datasets contained four types of multi-omics data: mRNA, miRNA, DNA methylation, and copy number variation.

Usage

Feel free to take a look at the jupyter notebook and the instructions of forked repositories.

Requirements

All the requirements for python scripts are located in requirements.txt file.

Has been tested in Linux (ubuntu) and Windows 11.

Tools:

Python
Jupyter Notebooks
R
DeepProg package
DCAP autoencoders

Results

This project field the need for more robust cancer subtype diagnosis using deep learning methods such as DCAP (A framework to integrate multi-omics data by Denoising Autoencoder for Accurate cancer prognosis prediction) concluding mRNA performance better in comparisson to miRNA, methylation and following CNV (Copy Number Variation). These constructed models could distinguish high-risk patients from low-risk by identifying at the same time breast cancer related biomarkers. Room for performance and improvements still exists while this empirical results might be utilized to impact hitherto patients.

Team Members

Vasileios Alevizos | [email protected], [email protected]| Karolinska Institutet, iKnowHow | Team Leader
Vanessa Xiao | [email protected] | MIT | Team Member
Yishu Qu | [email protected] | Northwestern University | Team Member
Alexus Acton | [email protected] | UAB | Team Member
Zongliang Yue | [email protected] | UAB | Team Member

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
configs		configs
docs		docs
notebooks		notebooks
results		results
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Pipeline Workflow.png		Pipeline Workflow.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cancer survival prediction and integration of multi-omics integration with Supervised Autoencoders, Stacked Autoencoders and Concrete Supervised Autoencoders for multiple correlated driver genes

Table of Contents

Background

Framework and workflow

Data

Usage

Requirements

Results

Team Members

About

Releases

Packages

Contributors 3

Languages

License

u-brite/team_papaki

Folders and files

Latest commit

History

Repository files navigation

Cancer survival prediction and integration of multi-omics integration with Supervised Autoencoders, Stacked Autoencoders and Concrete Supervised Autoencoders for multiple correlated driver genes

Table of Contents

Background

Framework and workflow

Data

Usage

Requirements

Results

Team Members

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages