Reproducible research compedium to accompany "Cost-neutral newborn population screening for 412 genetic diseases by genome sequencing with large diplotype models"
The workflow diagram provides a high-level depiction of the steps involved in this analysis workflow_diagram.pdf
Source code to run the analysis is included here as a Jupyter notebook. A Python script conversion is also provided.
- Variants of interest (currently BeginNGS v2, 53,855 P and LP variants that map to 342 genes, 412 SCGD, and 1,603 SCGD therapeutic interventions) is normally encapsulated as a fixed resource inthe UDF, but can be implemented as an parameter. This is pre-annotated with consequence and population frequency information, but only chr-pos-ref-alt is used for the query itself
- Blocklist - entries classified as NSDCC (non-severe disease causing in childhood)
- MOI - mode of inheritance information