DESeq2_RNA.Seqv9_LRT.Rmd

---
title: "mRNA-Sequencing Workflow"
author: "Mark E. Pepin, PhD"
date: "05/05/2019"
output:
  html_document:
    code_folding: show
    toc: yes
    toc_float: yes
  pdf_document:
    toc: yes
geometry: margin=1 in
header-includes:
- \usepackage{booktabs}
- \usepackage{longtable}
- \usepackage{array}
- \usepackage{multirow}
- \usepackage[table]{xcolor}
- \usepackage{wrapfig}
- \usepackage{float}
- \usepackage{colortbl}
- \usepackage{pdflscape}
- \usepackage{tabu}
- \usepackage{threeparttable}
mainfont: Times
fontsize: 10pt
always_allow_html: yes
---

```{r setup, include=FALSE}
library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=30),tidy=FALSE, warning = FALSE, message = FALSE, cache = TRUE)
options(knitr.kable.NA = '')

```

**Code Author**: Mark E. Pepin
**Contact**: pepinme@uab.edu
**Institution**: University of Alabama at Birmingham  
**Location**: 542 Biomedical Research Building 2, Birmingham, AL 35294  

# RNA-Sequencing Analysis

## Read Alignment using STAR

RNA was isolated from the left ventricle endocardial tissue using the RNeasy Lipid Mini-Kit according to the manufacturer's instructions (Qiagen, Valencia, CA). High-throughput RNA sequencing was performed at the University of Utah GenomX core. Once sample read quality was checked (multiQC analysis), the paired-end fastq files were then aligned to the reference genome, which was created using Gencode human sequence (GRCh38.p12.genome.fa) and annotation (gencode.v28.annotation.gtf). STAR aligner is the current gold-standard for this, which we used for the current analysis. Before aligning each fastq file to the genome, an annotated reference genome must first be assembled. This was performed as follows (this was performed in Cheaha as `bash GenomeReference.sh':

`STAR=../../Tools/STAR-2.5.3a/bin/Linux_x86_64/STAR`

`$STAR \`
`--runThreadN 12 \`
`--runMode genomeGenerate \`
`--genomeDir ./ \`
`--genomeFastaFiles /data/scratch/pepinme/huHrt/Input/Genome/GRCh38.p12.genome.fa \`

Alignment of short reads to this annotated genome could then proceed, using the following SLURM batch script which was submitted to the UAB *Cheaha* compute cluster (See **Appendix**). This shell script contains the following STAR alignment run settings:

`$STAR_RUN \`
`--genomeDir $GENOME_DIR \`
`--readFilesCommand zcat \`
`--readFilesIn $INPUT_DIR/fastq/${VAR}.txt.gz \`
`--sjdbGTFfile $GENOME_DIR/gencode.v28.annotation.gtf \`
`--sjdbOverhang 99 \`
`--quantMode GeneCounts \`
`--runThreadN 12 \`
`--outSAMtype BAM SortedByCoordinate \`
`--outFileNamePrefix ${RESULTS_DIR}/Alignment/${VAR}_`

## Read Count Compiling

Before the DESeq2-based differential expression can be computed, the counts generated by STAR need to be compiled, since the .tab file contains count information based on forward, reverse, and combined reads. Therefore, we will take the fourth column in each table and merge them.

```{r Count.Compile}
Count.files <- list.files(path = "../2_Input/1_RNA/Counts/", pattern = "*ReadsPerGene.out.tab", full.names = TRUE, all.files = TRUE)
Counts <- lapply(Count.files, read.table, skip = 4) #skip the first 4 rows, since these are summary data.
#Create a data.frame containing the raw counts
countData.raw <- as.data.frame(sapply(Counts, function(x) x[,4])) #selects only the 4th column as the raw counts.
#Generate Column names and Row names for the counts (remove the extra nonsense from the path names)
colnames <- gsub( "ReadsPerGene[.]out[.]tab", "", Count.files)
colnames <- gsub( "[.][.]/2_Input/1_RNA/Counts//", "", colnames)
colnames(countData.raw) <- colnames
row.names(countData.raw) <- Counts[[1]][,1]
```

## Data Pre-Processing

After alignment of the fastq files to the annotated genome assembly (hg38), the first step in the analysis is to consolidate the raw data from the provided files into data matrix that can be used to generate a normalized count matrix and differential expression dataset.

```{r tidy.data_TG.v.NTG, echo=FALSE}
library(DESeq2)
library(data.table)
library(biomaRt)
library(dplyr)
library(openxlsx)
library(Haplin)
library(pheatmap)
library(calibrate)

###Part 1: Importing the Data
##Set the experimental conditions
DESCRIPTION= "HF.v.NF"
ifelse(!dir.exists(file.path(paste0("../3_Output/1_RNA/", DESCRIPTION))), dir.create(file.path(paste0("../3_Output/1_RNA/", DESCRIPTION))), FALSE)
#Parameters
RESPONSE=c("CON", "NR")
TIMING=c("CON","Pre")
ETIOLOGY=c("CON","ICM", "NICM")
# PAIRS = c("Paired")
# Create the countData (Input to DESeq2)
colData_all<-openxlsx::read.xlsx("../2_Input/_Patient/Index_no.outliers.xlsx", sheet = "Index_no.outliers", rowNames = T)
colData<-dplyr::filter(colData_all, RNA.Seq_ID!="")
colData_all<-colData_all[!is.na(colData_all$RNA.Seq_ID),]
rownames(colData_all)<-colData_all$RNA.Seq_ID
#Select the patient characteristics needed for the current comparison.
colData<-dplyr::filter(colData_all, Response %in% RESPONSE, Timing %in% TIMING, Etiology %in% ETIOLOGY) ##Pairing %in% PAIRS
colData$Response<-factor(colData$Response, levels = c("CON", "NR", "R"))
colData$Timing<-factor(colData$Timing, levels = c("CON", "Pre", "Post"))
colData$Etiology<-factor(colData$Etiology, levels = c("CON","NICM", "ICM"))
colData$Timing<-as.numeric(colData$Timing)
colData$Etiology<-as.numeric(colData$Etiology)
##Import Counts Data
countData<-as.data.frame(countData.raw)
countData<-countData[,colData$RNA.Seq_ID]
# write.csv(colData, "../2_Input/_Patient/colData_complete.csv")
```

### Count Normalization

DESeq2 (version 1.18.1) was used to perform the raw count normalization within R (version 3.4.2)

```{r DESeq2}
######### RUN DESeq2
dds<-DESeqDataSetFromMatrix(countData=countData, colData = colData, design= ~Timing+Etiology+Age_yrs)
dds
# dds$ICM<-relevel(dds$ICM, ref = "NICM") # setting the reference to wild-type genotype. only relevant for factors.
# dds$Timing<-relevel(dds$Timing, ref="1")
#Determine the Dispersion Relationship (determines which distribution to use for the differential analysis) - should take about 2 minutes
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
plotDispEsts(dds)
png(file=paste0("../3_Output/1_RNA/", DESCRIPTION,"/",DESCRIPTION, "_Dispersion.png"))
plotDispEsts(dds)
dev.off()
```


There appears to be a linear negative correlation between the mean and dispersion estimates, so the parametric test model should be an appropriate fit for differential expression analysis. Furthermore, we could get away with the parametric fit-type, but the run-time is not significantly impaired, allowing us to use the 'local' fit-type. NOTE: If it were nonlinear throughout, we would require a 'local' nonparametric fit-type.

### Differential Expression Analysis

```{r Diff-Exp}
##Pre-Filter to reduce the size of this dataset (according to the DESeq2 document reccomendations)
dds <- dds[ rowSums(counts(dds)) > 1, ]
dds
################Run DESeq2 differential quantification (Likelihood ratio test (LRT) or Wald-test)
dds<-DESeq(dds, test="LRT", fitType="parametric", reduced = ~Age_yrs+Etiology) #LRT if using multiple variables
#compile the results tables
resultsNames(dds)
resdf<-as.data.frame(DESeq2::results(dds, format = "DataFrame", name = "Timing"))
resdf$ensembl_gene_id<-as.character(row.names(resdf))
```

Once the differential Expression analysis was performed, the following were compiled into a results data matrix: Log2FoldChange, P-value, Bonferroni-Adjusted P-Value (Q-value), and normalized counts for each sample.

```{r Results}
####Add Annotation to the results file (this will take some time, about 5 minutes...)
##Add Gene Information
library(biomaRt)
hsapiens <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
bm <- getBM(attributes=c("ensembl_gene_id_version", "external_gene_name", "chromosome_name", "start_position", "end_position"),  mart=hsapiens)
write.csv(bm, "../2_Input/1_RNA/BiomaRt_Annotation.csv")
bm<-read.csv("../2_Input/1_RNA/BiomaRt_Annotation.csv", row.names = 1)
Results_p05.Annot<-merge(resdf, bm, by.x="ensembl_gene_id", by.y="ensembl_gene_id_version")

####Add normalized count data (for heatmap and sota)
normcount<-as.data.frame(counts(dds, normalized=TRUE))
normcount$ensembl_gene_id<-rownames(normcount)
results<-dplyr::left_join(Results_p05.Annot, normcount, by="ensembl_gene_id")
# counts_complete<-dplyr::select(results, "external_gene_name", contains("1"))
# write.csv(counts_complete,"../2_Input/1_RNA/counts_complete.csv", row.names = F)

#Create filters as tabs
results_p05<-dplyr::filter(results, pvalue<0.05)
results_q05<-dplyr::filter(results, padj<0.05)

library(openxlsx)
wb_DESeq<-createWorkbook()
#Unfiltered
  addWorksheet(wb_DESeq, "Unfiltered")
  writeData(wb_DESeq, "Unfiltered", results, startCol = 1)
#P-value Significant (0.05)
  addWorksheet(wb_DESeq, "P < 0.05")
  writeData(wb_DESeq, "P < 0.05", results_p05, startCol = 1)
#Q-value Significant (0.05)
  addWorksheet(wb_DESeq, "Q < 0.05")
  writeData(wb_DESeq, "Q < 0.05", results_q05, startCol = 1)
saveWorkbook(wb_DESeq, file = paste0("../3_Output/1_RNA/", DESCRIPTION,"/",DESCRIPTION, "_DESeq2.xlsx"), overwrite = TRUE)

```

## QQ Plot

Before we examined the gene networks and pathways differentially regulated by NRF2 knockout, the first task was to determine whether transgene induction resulted in global changes. An effective way of determining this is the QQ plot, which compares the P-value distribution produced by the pairwise comparison (transgenic vs. WT mouse) to that of a random normal distribution. Below, it is evident that the two experimental groups produce robustly divergent expression patterns consistent with a true population difference worthy of differential expression analysis.

```{r QQ-Plot}
#Create Q-Q plot
test<-results
test<-test[complete.cases(test),]
pQQ(test$pvalue, lim=c(0,10))

png(file=paste0("../3_Output/1_RNA/", DESCRIPTION,"/",DESCRIPTION, "_QQ.Plot.png"))
pQQ(test$pvalue, lim=c(0,10))
dev.off()
```


# Volcano Plot

```{r Volcano}
# Load packages
library(dplyr)
library(ggplot2)
library(ggrepel)
library(openxlsx)
# Read data from the web
results<-read.xlsx(paste0("../3_Output/1_RNA/", DESCRIPTION, "/", DESCRIPTION, "_DESeq2.xlsx"), sheet = "Unfiltered")
results = mutate(results, sig=ifelse(results$pvalue<0.05 & abs(results$log2FoldChange)>0.585, "P < 0.05 and |FC| > 1.5", "Not Sig"), minuslogpvalue = -log(pvalue), log2FC=log2FoldChange)
max(results$minuslogpvalue, na.rm = TRUE)
#plot the ggplot
p = ggplot(results, aes(log2FC, minuslogpvalue)) + theme(panel.background = element_rect("white", colour = "black", size=2), panel.grid.major = element_line(colour = "gray50", size=.75), panel.grid.minor = element_line(colour = "gray50", size=0.4)) + 
geom_point(aes(fill=sig), colour="black", shape=21) + labs(x=expression(Log[2](Fold-Change)), y=expression(-Log[10](P-value))) + xlim(-7,7)+ ylim(-0, max(27, na.rm = TRUE)) + geom_hline(yintercept = 0, size = 1) + geom_vline(xintercept=0, size=1)+ 
scale_fill_manual(values=c("black", "tomato"))
#add a repelling effect to the text labels.
p+geom_text_repel(data=filter(results, minuslogpvalue>10 & abs(log2FoldChange)>2.5 | minuslogpvalue>7 & abs(log2FoldChange)>4.5), aes(label=external_gene_name))

pdf(file = paste0("../3_Output/1_RNA/", DESCRIPTION, "/", DESCRIPTION, "Volcano.Plot.pdf"))
p+geom_text_repel(data=filter(results, minuslogpvalue>5 & abs(log2FoldChange)>2.5 | minuslogpvalue>7 & abs(log2FoldChange)>4.5), aes(label=external_gene_name))
dev.off()

```


## Heatmap Visualization (P < 0.01)

In order to visualize the distribution of differentially expressed genes, as well as determine the effect various heart failure etiologies on transcription, hierarchical clustering and heatmap visualization were performed at the Q < 0.05 statistical level. This analysis reveals that P < 0.05 is sufficient to separate all samples by genotype.


```{r heatmap}
STATISTIC=0.01
library(pheatmap)
results_p05<-filter(results, padj<STATISTIC)
hm_data<-data.matrix(dplyr::select(results_p05, starts_with("1")))
rownames(hm_data)<-results_p05$external_gene_name
##
##Index file for annotating samples
rownames(colData)<-colData$RNA.Seq_ID
Index<-dplyr::select(colData, Timing, Etiology, Response, Age_yrs)
Index<-as.data.frame(Index)

paletteLength <- 100
myColor <- colorRampPalette(c("dodgerblue4", "white", "gold2"))(paletteLength)
pheatmap(hm_data,
         cluster_cols=T, 
         border_color=NA, 
         cluster_rows=T, 
         scale = 'row',
         show_colnames = T, 
         show_rownames = F, 
         color = myColor,
         annotation_col = Index,
         filename=paste0("../3_Output/1_RNA/", DESCRIPTION,"/",DESCRIPTION, "_Heatmap_Normcount.P01.pdf"))

vst<-varianceStabilizingTransformation(dds)
vst<-assay(vst)
#Dendrogram
# library("dendextend")
# dists <- dist(t(vst))
# hc<-hclust(dists)
# dend<-as.dendrogram(hc)
# n_etiology <- length(unique(colData$Timing))
# cols_4 <- colorspace::rainbow_hcl(n_etiology, c = 70, l  = 50)
# col_car_type <- cols_4[as.factor(colData$Timing)]
# labels_colors(dend) <- col_car_type[order.dendrogram(dend)]
# dend <- color_branches(dend, k = 2) %>% set("branches_lwd", 2)
# k234 <- cutree(dend, k = 2:4)
# par(mar = c(12,4,1,1))
# plot(dend)
# colored_bars(cbind(k234[,3:1], col_car_type), dend, rowLabels = c(paste0("k = ", 4:2), "Condition"))
# 
# pdf(paste0("../3_Output/1_RNA/", DESCRIPTION,"/",DESCRIPTION, "_Unsupervised Dendrogram.pdf"), height = 4, width = 5)
# par(mar = c(12,4,1,1))
# plot(dend)
# colored_bars(cbind(k234[,3:1], col_car_type), dend, rowLabels = c(paste0("k = ", 4:2), "Condition"))
# dev.off()

normhm<-vst[row.names(resdf[which(resdf$pvalue<0.05),]),]
normhm<-scale(t(normhm))
normhm<-t(normhm)
pheatmap(normhm,
         cluster_cols=T, 
         clustering_method = "ward.D2",
         border_color=NA, 
         cluster_rows=T, 
         scale = 'row', 
         show_colnames = T, 
         show_rownames = F, 
         color = myColor,
         annotation_col = Index,
         filename=paste0("../3_Output/1_RNA/", DESCRIPTION,"/",DESCRIPTION, "_VST.Heatmap.P01.pdf"))
```


## Principal Components Analysis

Once we established that the populations under consideration truly display divergene expression patterns, we sought to determine whether unbiased global gene expression patterns recapitulate the described phenotypes within each heart failure group. To accomplish this, an unsupervised Principal Components Analysis (PCA) was initially used with normalized counts.

### PCA Features

Before running the principal components analysis, it was necessary to first determine the number of PC's required to account for 80% of the variance, a machine-learning algorithmm benchmark that provides sufficient confidence in the analysis.

```{r PCA_Features}
#Plot Features of the PCA
library(dplyr)
library(plotly)
##Import the data to be used for PCA
results_DEG<-dplyr::select(results, contains("1"))
results_DEG<-results_DEG[order(-rowSums(results_DEG)),]
#transpose the dataset (required for PCA)
data.pca<-t(results_DEG)
data.pca<-as.data.frame(data.pca)
##Import the data to be used for annotation
rownames(colData)<-colData$RNA.Seq_ID
Index<-colData
Index<-as.data.frame(Index)
##merge the file
data.pca_Final<-merge(Index, data.pca, by=0)
rownames(data.pca_Final)<-data.pca_Final$Row.names
pca.comp<-prcomp(data.pca_Final[,(ncol(Index)+2):ncol(data.pca_Final)])

pcaCharts=function(x) {
    x.var <- x$sdev ^ 2
    x.pvar <- x.var/sum(x.var)
    par(mfrow=c(2,2))
    plot(x.pvar,xlab="Principal component", 
         ylab="Proportion of variance", ylim=c(0,1), type='b')
    plot(cumsum(x.pvar),xlab="Principal component", 
         ylab="Cumulative Proportion of variance", 
         ylim=c(0,1), 
         type='b')
    screeplot(x)
    screeplot(x,type="l")
    par(mfrow=c(1,1))
}
pcaCharts(pca.comp)

png(file=paste0("../3_Output/1_RNA/", DESCRIPTION,"/",DESCRIPTION, "_PCA.Charts.png"))
pcaCharts(pca.comp)
dev.off()

```

### 3-Dimensional PCA

From the previous calculations, it is seens that only 2 principal components are necessary (accounting for >80% cumulative variance). Nonetheless, below is a 3-D PCA to ensure that all groups are characterize to higher-degree of stringency.

```{r PCA-Summary}
##Create a 3D-PCA for Inspection
library(plotly)
##Index
Index_PCA<-read.xlsx("../2_Input/_Patient/Index_no.outliers.xlsx", sheet = "Index_Matrix", rowNames = T)
PCs<-merge(pca.comp$x, Index, by=0)
rownames(PCs)<-PCs$Row.names
PCs$Timing<-as.numeric(as.factor(PCs$Etiology))
ax_text<-list(
  family = "times",
  size = 12,
  color = "black")
t <- list(
  family = "times",
  size = 14,
  color = "black")
p <- plot_ly(PCs, x = ~PC1, y = ~PC2, z = ~PC3,
   marker = list(color = ~Timing, 
                 colorscale = c('#FFE1A1', '#683531'), 
                 showscale = TRUE),
  text=rownames(PCs)) %>%
  add_markers() %>% 
  layout(scene = list(
     xaxis = list(title = 'PC1', zerolinewidth = 4, 
        zerolinecolor="darkgrey", linecolor="darkgrey", 
        linewidth=4, titlefont=t, tickfont=ax_text),
     yaxis = list(title = 'PC2', zerolinewidth = 4, 
        zerolinecolor="darkgrey", linecolor="darkgrey", 
        linewidth=4, titlefont=t, tickfont=ax_text),
    zaxis = list(title = 'PC3', zerolinewidth = 4, 
        zerolinecolor="darkgrey",  linecolor="darkgrey", 
        linewidth=4, titlefont=t, tickfont=ax_text)),
  annotations = list(
           x = 1.13,
           y = 1.03,
           text = 'Response',
           xref = '1',
           yref = '0',
           showarrow = FALSE))
p #must comment out for PDF generation via knitr (Pandoc)
```

# IPA Upload

```{r IPA}
#Import data
library(openxlsx)
library(dplyr)
DEGs_NR.PrevCON<-read.xlsx("../3_Output/1_RNA/IPA/NonRespond-Pre.v.CON_DESeq2.xlsx", sheet = "Unfiltered")
DEGs_NR.PrevCON$log2FC_NR.PrevCON<-DEGs_NR.PrevCON$log2FoldChange
DEGs_NR.PrevCON$pvalue_NR.PrevCON<-DEGs_NR.PrevCON$pvalue
DEGs_NR.PrevCON$padj_NR.PrevCON<-DEGs_NR.PrevCON$padj
DEGs_NR.PrevCON<-dplyr::select(DEGs_NR.PrevCON, external_gene_name, contains("log2FC"), contains("pvalue_"), contains("padj_"))

DEGs_R.PrevCON<-read.xlsx("../3_Output/1_RNA/IPA/Respond-Pre.v.CON_DESeq2.xlsx", sheet = "Unfiltered")
DEGs_R.PrevCON$log2FC_R.PrevCON<-DEGs_R.PrevCON$log2FoldChange
DEGs_R.PrevCON$pvalue_R.PrevCON<-DEGs_R.PrevCON$pvalue
DEGs_R.PrevCON$padj_R.PrevCON<-DEGs_R.PrevCON$padj
DEGs_R.PrevCON<-dplyr::select(DEGs_R.PrevCON, external_gene_name, contains("log2FC"), contains("pvalue_"), contains("padj_"))

DEGs_NR.PostvCON<-read.xlsx("../3_Output/1_RNA/IPA/NonRespond-Post.v.CON_DESeq2.xlsx", sheet = "Unfiltered")
DEGs_NR.PostvCON$log2FC_NR.PostvCON<-DEGs_NR.PostvCON$log2FoldChange
DEGs_NR.PostvCON$pvalue_NR.PostvCON<-DEGs_NR.PostvCON$pvalue
DEGs_NR.PostvCON$padj_NR.PostvCON<-DEGs_NR.PostvCON$padj
DEGs_NR.PostvCON<-dplyr::select(DEGs_NR.PostvCON, external_gene_name, contains("log2FC"), contains("pvalue_"), contains("padj_"))

DEGs_R.PostvCON<-read.xlsx("../3_Output/1_RNA/IPA/Respond-Post.v.CON_DESeq2.xlsx", sheet = "Unfiltered")
DEGs_R.PostvCON$log2FC_R.PostvCON<-DEGs_R.PostvCON$log2FoldChange
DEGs_R.PostvCON$pvalue_R.PostvCON<-DEGs_R.PostvCON$pvalue
DEGs_R.PostvCON$padj_R.PostvCON<-DEGs_R.PostvCON$padj
DEGs_R.PostvCON<-dplyr::select(DEGs_R.PostvCON, external_gene_name, contains("log2FC"), contains("pvalue_"), contains("padj_"))

#Merge them fully

# IPA_Import.p05<-full_join(DEGs_NR.PrevCON, DEGs_R.PrevCON)
# IPA_Import.p05<-full_join(IPA_Import.p05, DEGs_NR.PostvCON)
# IPA_Import.p05<-full_join(IPA_Import.p05, DEGs_R.PostvCON)
# write.xlsx(IPA_Import.p05, "../3_Output/1_RNA/IPA/IPA_Import.p05.xlsx")
```

# Create correlation graphs

```{r Correlation}
library(ggplot2)
library(ggrepel)
IPA_Import.p05<-openxlsx::read.xlsx("../3_Output/1_RNA/IPA/IPA_Import.p05.xlsx")
IPA_Import.p05[is.na(IPA_Import.p05)]<-0
IPA_Import.p05<-dplyr::filter(IPA_Import.p05, pvalue_NR.PrevCON<0.05)
IPA_Import.p05$Resid_NR.Pre.v.Post<-resid(lm(IPA_Import.p05$log2FC_NR.PostvCON~IPA_Import.p05$log2FC_NR.PrevCON))
#Create ggplot figures
g.cor<-ggplot(IPA_Import.p05, aes(log2FC_NR.PrevCON,log2FC_NR.PostvCON)) + geom_point(shape = 1) + geom_smooth(method = loess) + theme_linedraw()
#Label with outliers based on residuals
g.cor+geom_text_repel(data=filter(IPA_Import.p05, abs(Resid_NR.Pre.v.Post)>3.5), aes(label=external_gene_name))

p = ggplot(IPA_Import.p05, aes(log2FC, minuslogpvalue)) + theme(panel.background = element_rect("white", colour = "black", size=2), panel.grid.major = element_line(colour = "gray50", size=.75), panel.grid.minor = element_line(colour = "gray50", size=0.4)) + 
geom_point(aes(fill=sig), colour="black", shape=21) + labs(x=expression(Log[2](Fold-Change)), y=expression(-Log[10](P-value))) + xlim(-5,5)+ ylim(-0, max(results$minuslogpvalue, na.rm = TRUE)) + geom_hline(yintercept = 0, size = 1) + geom_vline(xintercept=0, size=1)+ 
scale_fill_manual(values=c("black", "tomato"))

```


# Supplemental Table: R Session Information

All packages and setting are acquired using the following command: 

```{r settings}
library(kableExtra)
sinfo<-devtools::session_info()
sinfo$platform
sinfo$packages %>% kable( 
                         align="c", 
                         longtable=T, 
                         booktabs=T,
                         caption="Packages and Required Dependencies") %>% 
    kable_styling(latex_options=c("striped", "repeat_header", "condensed"))
```