Skip to content

Latest commit

 

History

History
38 lines (26 loc) · 1.83 KB

README.md

File metadata and controls

38 lines (26 loc) · 1.83 KB

Human Gene Pathway

This repository processes and combines human gene-pathway data from PathwayCommons and WikiPathways. It focuses on the associations between human genes and various biological pathways sourced from these platforms, creating a unified dataset for genomic analysis.

Execution

conda env create -f environment.yml 

conda activate human-gene-pathway

bash download.sh

python process.py

Input

  • protein_coding_gene.csv
    • The file is a CSV containing extracted data on protein-coding genes from the NCBI dataset.
  • wikipathways-Homo_sapiens.gmt
    • Downloaded from the WikiPathways database, this file contains curated biological pathways for Homo sapiens (humans). Each entry includes pathway information such as gene sets associated with specific biological processes or diseases.
    • WikiPathways is an open-source platform that offers manually curated biological pathways.
  • PathwayCommons12.All.hgnc.gmt
    • This file from Pathway Commons provides comprehensive pathway data, including gene interactions and pathway information.
    • PathwayCommons is a comprehensive collection of pathways from multiple databases.

Output

  • node_Pathway.csv.gz
    • This file lists all the pathways with their identifiers, names, and URLs.
  • edge_Gene_participatesIn_Pathway.csv.gz
    • This file describes the associations between genes and pathways.

Note: These CSV files are formatted for easy import into Neo4j.

License

All original content in this repository is released as CC0 1.0 (public domain). WikiPathways data is licensed as CC BY 3.0. Reactome data is licensed as CC BY 4.0. PID data is in the public domain.