forked from devonorourke/sus19mb
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmarkdown_coOccur.Rmd
577 lines (450 loc) · 20.7 KB
/
markdown_coOccur.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
---
title: "UCDavisHackaton"
output: pdf_document
---
```{r setup_1, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document to make a co-occurence network analysis and stats.
In these analyses, presences of OTUs are pair-wise compared across the different microbial samples. To limit the number of statistical comparisons, and thus constrain the number of potential false positives, OTUs that occurred in less than 5 samples will not be considered for network inferences. To establish edges (i.e. connections) between OTUs, and at the same time account for the limitations of scoring measures (Aitchison 1981; Legendre and Legendre, 1983), an ensemble method was used, based on non-parametric Spearman rank correlation and the Kullback-Leibler dissimilarity measure, as proposed by Faust et al. (2012).
To start, initiate and create the working environment. This code will also install libraries if not yet there.
```{r setup_2, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(phyloseq)
library(ggplot2)
library(RColorBrewer)
library(tidyverse)
library(gplots)
library(vegan)
library(cooccur)
library(igraph)
library(flexmix)
library(tidyr)
library(randomcoloR)
```
## Creating a phyloseq object
### Read in some data
Here we're importing data that is more or less an otu table and a taxonomy table. These steps will be different if you use your own data.
```{r create_otu_1}
taxatable_raw <- read_csv('bact_alldata_taxatable_wTax.csv')
head(taxatable_raw)
```
As you can see, there is a missing column name, and R filled it in 'X1'.
For the otu table, we want to have the OTU names as row names, and get rid of the taxa column. The final command turns the data frame into an otu object.
```{r create_otu_2}
otu <- as.data.frame(select(taxatable_raw, -X1,-taxonomy))
row.names(otu) <- taxatable_raw$X1
otu <- otu_table(otu, taxa_are_rows = T)
```
The taxonomy is in the same file, but phyloseq will want it separate. Further, we want to separate all the phylogenetic levels into separate columns. Row names should be the OTU ids. Phyloseq also wants the taxonomy to be a matrix, before converting to a taxonomy table.
```{r create_taxa_1}
taxonomy <- data.frame(taxonomy=taxatable_raw$taxonomy)
row.names(taxonomy)<- taxatable_raw$X1
# Split taxonomy into separate columns
taxonomy <- data.frame(separate(taxonomy, col=taxonomy, into= c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species"),
sep = ";"))
# Change spaces to NA for missing data
taxonomy <- apply(taxonomy, 2, function(x) gsub("^$|^ $", NA, x))
taxonomy <- as.matrix(taxonomy)
taxonomy <- tax_table(taxonomy)
```
Finally, we want to get the sample meta data.
```{r create_meta_1}
map <- read_csv('bact_alldata_mapfile.csv')
head(map)
```
Again, there is a missing column header. Here, we change it to 'Sample'. We also want to convert some columns which were coded as integers into factors.
Convert into sample data object.
```{r create_meta_2}
names(map)[1] <- 'Sample'
row.names(map) <- map$Sample
map[7:9] <- lapply(map[7:9] , factor)
head(map)
meta <- sample_data(map)
```
Finally, we can put it all together into a phyloseq object.
```{r create_phyloseq}
ps <- phyloseq(otu, taxonomy, meta)
```
... some further data formatting
``` {r format_data}
bacOTU<-data.frame(otu_table(ps))
bacTAX<-data.frame(tax_table(ps))
bacTAX$Domain <- gsub('k__', '', bacTAX$Domain, fixed=TRUE)
bacTAX$Phylum <- gsub('p__', '', bacTAX$Phylum, fixed=TRUE)
bacTAX$Class <- gsub('c__', '', bacTAX$Class, fixed=TRUE)
bacTAX$Order <- gsub('o__', '', bacTAX$Order, fixed=TRUE)
bacTAX$Family <- gsub('f__', '', bacTAX$Family, fixed=TRUE)
bacTAX$Genus <- gsub('g__', '', bacTAX$Genus, fixed=TRUE)
bacTAX$Species <- gsub('s__', '', bacTAX$Species, fixed=TRUE)
bacTAX[bacTAX=='']<-'Unassigned'
bacTAX[is.na(bacTAX)] <- 'Unassigned'
head(bacTAX)
```
``` {r TEST}
##THIS STEP ONLY FOR TESTING
# only look at first XX OTUs to cut computation time: only to test code!!!
bacOTU_orig<-bacOTU
bacTAX_orig<-bacTAX
bacOTU<-bacOTU_orig[1:1000,]
bacTAX<-bacTAX_orig[1:1000,]
```
## Loading functions
Before we can get started, we need to load several custom functions, adapted from Karoline Faust (original code can be found at http://psbweb05.psb.ugent.be/conet/documents/networksFromSimCounts.R)
Before computing the Kullback-Leibler dissimilarity, OTU-counts should be normalized by dividing them by their total across the different samples.
These functions are used to compute the Kullback-Leibler dissimilarity and asses it's statistical significance, as described in Faust et al. (2012). To calculate a measure-specific p-value, bootstrap score distributions with 1000 iterations are compared to point null values that are computed by permutation (n=1000).
```{r functions_kld}
compute.kld=function(x, pseudocount=0.00000001){
# diagonal is zero
kld=matrix(data=0,nrow=nrow(x),ncol=nrow(x))
for(i in 1:nrow(x)){
for(j in 1:i){
kld[i,j]=get.kld(x[i,],x[j,], pseudocount=pseudocount)
kld[j,i]=kld[i,j]
}
}
kld
}
get.kld=function(x,y, pseudocount=0.00000001){
if(length(x) != length(y)){
stop("The two vectors should have the same length!")
}
x[x==0]=pseudocount
y[y==0]=pseudocount
dis = 0
x = x/sum(x)
y = y/sum(y)
for(i in 1:length(x)){
if(!is.nan(x[i]) && !is.nan(y[i])){
ratioxy = log(x[i]/y[i])
ratioyx = log(y[i]/x[i])
dis = x[i]*ratioxy+y[i]*ratioyx + dis
}
}
dis
}
get.pval = function(matrix, x.index, y.index, N.rand=1000, method="spearman", renorm=F, permutandboot=F, plot=F, verbose=F) {
x = matrix[x.index,]
y = matrix[y.index,]
lower.tail = TRUE
# bray and kld are dissimilarities, so one-sided p-value needs to be computed from the upper tail
if(method == "bray" || method == "kld"){
lower.tail = FALSE
}
if(method == "spearman"){
this.sim = cor(x, y, use="complete.obs", method="spearman")
}else if(method == "pearson"){
this.sim = cor(x, y, use="complete.obs", method="pearson")
}else if(method == "bray"){
this.sim= vegdist(rbind(x,y),method="bray")
}else if(method == "kld"){
this.sim=get.kld(x,y)
}else{
stop("Select either spearman, pearson, kld or bray as method.")
}
rand.sim = rep(NA, N.rand)
boot.sim = rep(NA, N.rand)
for (i in 1:N.rand) {
rand = sample(x, length(x))
if(renorm == T){
mat.copy=matrix
mat.copy[x.index,]=rand
mat.copy = normalize(mat.copy)
rand = mat.copy[x.index,]
y = mat.copy[y.index,]
}
if(method == "spearman"){
rand.sim[i] = cor(rand, y, method="spearman", use="complete.obs")
}else if(method == "pearson"){
rand.sim[i] = cor(rand, y, method="pearson",use="complete.obs")
}else if(method == "bray"){
rand.sim[i] = vegdist(rbind(rand,y),method="bray")
}else if(method == "kld"){
rand.sim[i] = get.kld(rand,y)
}
}
rand.sim = na.omit(rand.sim)
if(plot == T){
col1=rgb(0,0,1,1/3)
col2=rgb(1,0,0,1/3)
hist(rand.sim,col=col1)
abline(v=mean(rand.sim),col="blue")
}
if(permutandboot){
x=matrix[x.index,]
y=matrix[y.index,]
for (i in 1:N.rand) {
rand.idx = sample(1:length(x),replace=TRUE)
x.boot=x[rand.idx]
y.boot=y[rand.idx]
if(method == "spearman"){
boot.sim[i] = cor(x.boot, y.boot, method="spearman", use="complete.obs")
}else if(method == "pearson"){
boot.sim[i] = cor(x.boot, y.boot, method="pearson",use="complete.obs")
}else if(method == "bray"){
boot.sim[i] = vegdist(rbind(x.boot,y.boot),method="bray")
}else if(method == "kld"){
boot.sim[i] = get.kld(x.boot,y.boot)
}
}
boot.sim = na.omit(boot.sim)
if(plot == T){
hist(boot.sim,col=col2,add=T)
abline(v=mean(boot.sim),col="red")
legend(x="topleft", c("Permut","Boot"), bg="white",col=c(col1,col2),lty=rep(1,2),merge=T)
}
# if we got enough non-NA permutation and bootstrap values, compute p-value
if(length(rand.sim) > round(N.rand/3) && length(boot.sim) > round(N.rand/3)){
pval = pnorm(mean(rand.sim),mean=mean(boot.sim),sd=sd(boot.sim), lower.tail=lower.tail)
}else{
pval = 0.5
}
}else{
# if we got enough non-NA permutation values, compute p-value
if(length(rand.sim) > round(N.rand/3)){
if (lower.tail) {
pval = (sum(this.sim > rand.sim) / length(rand.sim))
} else {
pval = (sum(this.sim < rand.sim) / length(rand.sim))
}
}else{
pval = 0.5
}
}
# set missing value (from constant vector) to intermediate p-value (worst possible p-value in this context)
if(is.na(pval)){
pval = 0.5
}
# p-values are one-sided, so high p-values signal mutual exclusion and are converted into low ones
if(pval > 0.5){
pval = 1 - pval
}
if(verbose == T){
print(paste("p-value =",pval))
print(paste("original score",this.sim))
print(paste("mean of null distrib",mean(rand.sim)))
print(paste("sd of null distrib",sd(rand.sim)))
if(permutandboot == T){
print(paste("mean of boot distrib",mean(boot.sim)))
print(paste("sd of boot distrib",sd(boot.sim)))
}
}
pval
}
```
# Getting started with the network analysis
First, we will need to make the data presence-absence, this is because the abundances can be skewed or biased. Samples need to be rows.
```{r presence_absence}
bacOTU_prab<-data.frame(t(bacOTU))
bacOTU_prab[bacOTU_prab>1]<-1
```
second, we need to remove non-informative OTUs, being those occuring in less than 5 samples, or occuring in all samples.
```{r non_informatives}
data_coOccur<-bacOTU_prab[,colSums(bacOTU_prab>0) >=5]
data_coOccur<-data_coOccur[,colSums(data_coOccur)<nrow(data_coOccur)]
```
# Creating the network matrix
this code will create the matrices to store the output of the co-occurence analysis
```{r network_1}
netmatrix <- data.frame(matrix(NA,nrow=ncol(data_coOccur),ncol=ncol(data_coOccur)))
rownames(netmatrix)<-colnames(data_coOccur)
colnames(netmatrix)<-colnames(data_coOccur)
p_matrx <- netmatrix
kldmatrix <- netmatrix
p_matrxkld <- netmatrix
```
Then, the correlations can be calculated.
NOTICE! THIS STEP IS VERY COMPUTATIONALLY INTENSIVE, MAY TAKE HOURS TO DAYS
```{r network_2, message=FALSE}
for (sp1 in colnames(data_coOccur)){
for (sp2 in colnames(data_coOccur)){
#spearman
if(sp1!=sp2){
cor_out<-cor.test(data_coOccur[,sp1], data_coOccur[,sp2], method="spearman")
p_matrx[sp1, sp2]<-cor_out$p.value
#Kullback-Leibler distance
y<-t(as.matrix(data.frame(spp1=data_coOccur[,sp1]/sum(data_coOccur[,sp1]),spp2=data_coOccur[,sp2]/sum(data_coOccur[,sp2]))))
kld_value<-compute.kld(y)[1,2]
pval_kld<-as.numeric(get.pval(y, 1, 2, N.rand=1000, permutandboot=F, method="kld"))
p_matrxkld[sp1, sp2]<-pval_kld
if (cor_out$p.value > 0.05 | pval_kld > 0.05){
netmatrix[sp1,sp2] <- as.numeric(0)
kldmatrix[sp1,sp2] <- as.numeric(0)
}
else{
netmatrix[sp1,sp2] <- cor_out$estimate
kldmatrix[sp1,sp2] <- kld_value
}
}
else{
netmatrix[sp1,sp2] <- as.numeric(0)
kldmatrix[sp1,sp2] <- as.numeric(0)
p_matrx[sp1,sp2] <- as.numeric(0)
p_matrxkld[sp1,sp2] <- as.numeric(0)
}
}
}
```
because the analysis takes so much time, it's best to save the outcome here before we continue
```{r network_3}
write.csv2(netmatrix, "UCDavisCourseNetmatrix.csv")
```
# Correcting p-values with the Benjamini-hochberg method and remove new non-significants correlations
To further reduce the number of false positives, the Benjamini-Hochberg method (Benjamini and Hochberg, 1995) was implemented to correct for multiple testing. We also need to execute this for the two correlation methods that were used (Spearman and kld)
After this correction, edges with an adjusted p-value above 0.05 will be discarded. This way, only edges remain that were significant for both the Spearman correlation and kld dissimilarity measures.
```{r pvalues}
netmatrix2<-netmatrix
pvalsadj2<-p.adjust(unlist(p_matrxkld), 'BH')
pvalsadj2<-matrix(nrow=nrow(p_matrxkld), ncol=ncol(p_matrxkld), data=pvalsadj2)
for(i in 1:nrow(pvalsadj2)){
for(j in 1: ncol(pvalsadj2)){
if(pvalsadj2[i,j]>0.05){
netmatrix2[i,j]<-as.numeric(0)
}
}
}
pvalsadj2<-p.adjust(unlist(p_matrx), 'BH')
pvalsadj2<-matrix(nrow=nrow(p_matrx), ncol=ncol(p_matrx), data=pvalsadj2)
for(i in 1:nrow(pvalsadj2)){
for(j in 1: ncol(pvalsadj2)){
if(pvalsadj2[i,j]>0.05){
netmatrix2[i,j]<-as.numeric(0)
}
}
}
```
# Going through the edges
after these corrections, we need to remove OTUs with no edges
```{r edges_1}
netmatrix_parse<-netmatrix[,colSums(netmatrix)>0]
netmatrix_parse<-netmatrix_parse[rowSums(netmatrix_parse)>0,]
```
We should also discard edges which have low correlation coeficients (both positive and negative). Choosing this cutoff is an arbitrary descision. The more stringent it is, the fewer edges, but the higher the chance the correlation is indicating a true correlation. In this step, we also change the correlation matrix to a matrix with presence or absences of edeges (1 or 0).
In this example, only edges that were significantly supported by both methods and with a Spearman rank correlation >0.6 were retained in the network (Barber??n et al., 2012).
```{r edges_2}
netmatrix_parse[netmatrix_parse>=0.60]<-1
netmatrix_parse[netmatrix_parse<0.60]<-0
netmatrix_parse<-netmatrix_parse[,colSums(netmatrix_parse)>0]
netmatrix_parse<-netmatrix_parse[rowSums(netmatrix_parse)>0,]
```
# Visualizing the network
This code will create the network object to be used in igraph.
```{r igraph}
subi_network <- graph.adjacency(as.matrix(netmatrix_parse),mode="undirected",weighted=NULL)
```
We can use taxonomy as a vertex object (vertices are the same as the nodes). We will also assign a random colour to each phylum.
```{r tax_to_vertex_1}
tax_sub<-bacTAX[,c("Phylum", "Class")]
colnames(tax_sub)<-c("Phylum", "Class")
tax_sub<-tax_sub[names(netmatrix_parse),]
V(subi_network)$Phylum=as.character(tax_sub$Phylum[match(V(subi_network)$name,row.names(tax_sub))])
V(subi_network)$color=as.character(tax_sub$Phylum[match(V(subi_network)$name,row.names(tax_sub))])
for(n in names(table(V(subi_network)$Phylum))){
randomcol<-randomColor(count = 1)
V(subi_network)$color=gsub(n,randomcol,V(subi_network)$color, fixed=TRUE)
}
```
To create a circular network, we need to set a few more visual variables, like vertex and label size. It is also better to give each taxon an order in the graph, so that taxonomic groups stay together. As ggplot usually orders data alphabethically, we can use this attribute by assigning a letter (in this case random) to each taxonomic group.
```{r tax_to_vertex_2}
V(subi_network)$label.cex = 0.6
V(subi_network)$circord=as.character(tax_sub$Phylum[match(V(subi_network)$name,row.names(tax_sub))])
for(n in names(table(V(subi_network)$Phylum))){
randomalpha<-sample(LETTERS, 1, TRUE)
V(subi_network)$circord=gsub(n,randomalpha,V(subi_network)$circord, fixed=TRUE)
}
layout <- layout_in_circle(subi_network, order=order(V(subi_network)$circord))
```
and this is how the resulting network looks like
Note: as a good alternative to a circular representation: check out the fruchterman reingold representation of a network graph. This will better show the individual clusters. This can be done with the parameter layout=layout.fruchterman.reingold.
```{r plot_network}
plot(subi_network, layout=layout, vertex.size=5, vertex.label=V(subi_network)$Phylum)
```
# Network statistics
As you can see, this network has `r ecount(subi_network)` edges and `r vcount(subi_network)` nodes.
We can calculate some basic statistics on the network, like the modularity or the clustering coefficient (transivity). The modularity is a measure that indicates to what the degree the network exist of distinct modules, while the clustering coefficient is the average fraction of pairs of species one link away from a species that are also linked to one another.
```{r stats}
table(V(subi_network)$Phylum)
modularity(subi_network, walktrap.community(subi_network)$membership)
transitivity(subi_network, type="global")
```
# Analysis of statistical overrepresentation
Overrepresentation of higher level taxa (i.e. phyla or classes) in the networks can assessed with the hypergeometric distribution implemented in the R function phyper (at p adj.<0.05). This discrete probability distribution describes the chance of finding the observed number of nodes (i.e. OTUs) belonging to higher level taxa, when OTUs would be randomly drawn from the population without replacement. We can perform the analysis of overrepresentation of edges between higher level taxa using the binomial distribution (i.e. with replacement) implemented in the R function pbinom, and using the frequency of edges in the network as background probabilities (Lima-Mendez et al., 2015). A p-value can be calculated as the probability of obtaining the observed number of edges between two higher level taxa. All p-values should also be adjusted for multiple testing, for instance using the Benjamini-Hochberg correction (Benjamini and Hochberg, 1995).
To refresh our minds: although the original OTU table was stored as bacOTU with the taxonomy in bacTAX, we started the analysis with the data_coOccur object, where some non-informative OTUs were removed. this latter is our starting point to which we need to compare.
## over/under representation of taxa
In this first chunck of code, we'll calculate the expected frequencies
```{r overrepresentation_exp}
plotdata<-data.frame(exp=as.numeric(table(bacTAX$Phylum)), obs=0, row.names=names(table(bacTAX$Phylum)))
```
to this, we'll add the observed frequencies
```{r overrepresentation_obs}
for(i in 1:nrow(plotdata)){
if(rownames(plotdata)[[i]] %in% names(table(V(subi_network)$Phylum))){
plotdata[i,'obs']<-as.numeric(table(V(subi_network)$Phylum)[c(rownames(plotdata)[[i]])])
}
}
```
This looks like this:
```{r overrepresentation_obs_2}
head(plotdata)
```
Here, we calculate what taxa (i.e. OTUs assigned to a particular phylum) are overrepresented in the network. First, we can have a quick look.
```{r overrepresentation_Taxa_1}
plot(plotdata$exp, plotdata$obs);text(plotdata$exp, plotdata$obs, labels=rownames(plotdata))
```
Then, we can calculate what taxonomic groups are overrepresented. In this process, we will also correct the p-values for multiple testing.
```{r overrepresentation_Taxa_2}
plotdata$phypMore<-phyper(plotdata[,'obs'], plotdata[,'exp'], sum(plotdata$exp)-plotdata[,'exp'], sum(plotdata$obs), lower.tail = FALSE)
plotdata$phypMore<-p.adjust(plotdata$phypMore, 'BH')
plotdata$phypLess<-phyper(plotdata[,'obs'], plotdata[,'exp'], sum(plotdata$exp)-plotdata[,'exp'], sum(plotdata$obs), lower.tail = TRUE)
plotdata$phypLess<-p.adjust(plotdata$phypLess, 'BH')
plotdata$statexp<-plotdata[,'exp']/sum(plotdata[,'exp'])*sum(plotdata[,'obs'])
```
the overrepresented taxonomic groups are:
```{r overrepresentation_Taxa_3, echo=FALSE}
rownames(plotdata[plotdata$phypMore<0.05,])
```
and the underrepresented taxonomic groups are:
```{r overrepresentation_Taxa_4, echo=FALSE}
rownames(plotdata[plotdata$phypLess<0.05,])
```
## over/under representation of edges
We can also asses if there are edges that are significantle over or underrepresented compared to randomly drawing edges between OTUs
```{r overrepresentation_Edges_1}
edgdata<-data.frame(taxon1=NA, taxon2=NA, exp=NA, obs=NA, prob=NA, statexp=NA)
ttedg<-length(colnames(data_coOccur))*length(colnames(data_coOccur))
cnt=0
for(i in 1:length(colnames(data_coOccur))){
for(j in 1:length(colnames(data_coOccur))){
cnt=cnt+1
tax1name=names(table(colnames(data_coOccur)))[[i]]
tax2name=names(table(colnames(data_coOccur)))[[j]]
edgdata[cnt,"taxon1"]<-tax1name
edgdata[cnt,"taxon2"]<-tax2name
totedg<-length(colnames(data_coOccur))*length(colnames(data_coOccur))
edgdata[cnt,"exp"]<-totedg
if(totedg>0){
taxnames<-colnames(data_coOccur)
OTUnames1<-intersect(rownames(netmatrix_parse), rownames(bacTAX[bacTAX$Phylum==tax1name,]))
OTUnames2<-intersect(rownames(netmatrix_parse), rownames(bacTAX[bacTAX$Phylum==tax2name,]))
nt_mtrx_test<- netmatrix_parse[OTUnames1,OTUnames2]
if(length(taxnames)>0){
obsedg<-sum(nt_mtrx_test!=0)
edgdata[cnt,"obs"]<-obsedg
edgdata[cnt,"prob"]<-as.numeric(pbinom(obsedg, totedg, obsedg/ecount(subi_network)))
edgdata[cnt,"statexp"]<-qbinom(0.5, ecount(subi_network), totedg/ttedg)
} else{
edgdata[cnt,"prob"]<-NA
}
} else{
edgdata[cnt,"prob"]<-NA
}
}
}
edgdata<-edgdata[complete.cases(edgdata),]
edgdata$prob<-p.adjust(edgdata$prob, 'BH')
```
These edges are significantly over or underreperesented
```{r overrepresentation_Edges_2, echo=FALSE}
edgdata_sign<-edgdata[edgdata$prob<0.05 & edgdata$obs>0,];edgdata_sign
```