forked from krabberod/UNIS_AB332_2024
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathComputer_lab_III.Rmd
219 lines (170 loc) · 8.69 KB
/
Computer_lab_III.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---
title: 'Community ecology - Computer lab III - AB332'
author: "Anders K. Krabberød (UiO) and Ramiro Logares (ICM)"
date: "October 2024"
output:
html_notebook: default
pdf_document: default
---
# Computer lab III
### First load the necessary packages.
```{r,include=FALSE}
library(tidyverse)
library(vegan)
library(corrplot)
library(RcmdrMisc)
library(ggord)
```
Load data from previous lab (if it is not already in the environment)
```{r}
load("AB332_lab_II.RData")
```
### Analyses using environmental variation
The aim is to investigate how the environmental variation may explain community variance.
**Read table with environmental data**:
```{r}
isa.metadata<-read_tsv("https://raw.githubusercontent.com/krabberod/UNIS_AB332_2024/main/computer_lab/data/AB332metadata_v3.txt")
isa.metadata
isa.metadata <- column_to_rownames(isa.metadata, var = "Sample_Name")
```
Check the variables available in the metadata:
```{r}
colnames(isa.metadata)
```
Select the environmental variables for the same samples as we have been using for the previous labs.
```{r}
isa.metadata.simp <- isa.metadata[6:30,]
isa.metadata.simp
```
Check that the samples are correct, i.e. that the same names are in the OTU-table and the metadata:
```{r}
identical(colnames(otu.tab.red),rownames(isa.metadata.simp))
```
For this exercise we will use a selection of the continuous variables as metadata.
This line will extract 8 of the variables, based on the column names:
```{r}
isa.metadata.cont <- isa.metadata.simp %>% dplyr::select("nitrate","phosphate","silicate","N_P","temp_25","sal_25","F_25","chla_GFF")
```
We transform variables using z-scores to have comparable ranges of variation.
```{r}
isa.metadata.cont.zscores <- scale(isa.metadata.cont, center = T, scale = T)
isa.metadata.cont.zscores[1:5,]
```
Let's check if the environmental variables are correlated to each other.
We Calculate correlations and p-values:
```{r}
(env.corr.signif.adjust <- rcorr.adjust(as.matrix(isa.metadata.cont.zscores)))
```
The p-values are corrected for multiple inference using Holm's method (see p.adjust).
More info on: https://en.wikipedia.org/wiki/Multiple_comparisons_problem
Holm corrected values for multiple comparisons
```{r}
env.corr.signif.r <- env.corr.signif.adjust$R$r
env.corr.signif.p <- env.corr.signif.adjust$P
# Edit the object to replace any "<" by "0" using the function "gsub"
env.corr.signif.p <- gsub("<","0", env.corr.signif.p)
# Modify the object to be numeric datatype.
# NB: the transformation is done so the matrix of p values can be read as numeric!
env.corr.signif.p <- apply(env.corr.signif.p, 2 ,as.numeric)
rownames(env.corr.signif.p) <- colnames(env.corr.signif.p)
```
Plot the correlation plot:
```{r}
corrplot(env.corr.signif.r , type="upper", order="hclust", p.mat = env.corr.signif.p, sig.level = 0.05,insig = "pch",hclust.method = c("average"), tl.cex= 0.8, tl.col="black", diag=F)
```
### Fitting environmental variables to ordinations
`envfit` will fit the environmental variables to the NMDS ordination as vectors
First we fit the variables to the OTU-table that was rarified:
```{r}
otu.tab.trans.ss.nozero.bray.nmds.envfit <- envfit(otu.tab.trans.ss.nozero.bray.nmds,
as.data.frame(isa.metadata.cont.zscores), permu=999, na.rm=T, display ="sites")
otu.tab.trans.ss.nozero.bray.nmds.envfit
```
The two last columns indicate the squared correlation coefficient and the associated p-value
We plot the vectors of the significant correlations
```{r}
plot(otu.tab.trans.ss.nozero.bray.nmds, type="t", display="sites") # plot the samples
plot(otu.tab.trans.ss.nozero.bray.nmds.envfit) # plot all environmental vectors
```
The plotting only the vectors with p<0.01.
```{r}
plot(otu.tab.trans.ss.nozero.bray.nmds, type="t", display="sites") # plot the samples
plot(otu.tab.trans.ss.nozero.bray.nmds.envfit, p.max=0.01)
```
Can you see any difference?
## Constrained Ordination
Distance-based redundancy analysis (dbRDA) is an ordination method similar to Redundancy Analysis (rda), but it allows non-Euclidean dissimilarity indices, such as Manhattan or Bray–Curtis distance.
Selection of the most important (i.e. signficant) variables for dbRDA is done by comparing a null model to the full model and doing a stepwise selection of significant variables.
Start with a model containing only species matrix and intercept:
```{r}
mod0.rarefaction <- capscale(otu.tab.trans.ss.nozero.bray ~ 1, as.data.frame(isa.metadata.cont.zscores))
mod0.rarefaction
```
Now make a model including all variables from env matrix (the dot after tilde (~) means ALL!)
```{r}
mod1.rarefaction <- capscale(otu.tab.trans.ss.nozero.bray ~ ., as.data.frame(isa.metadata.cont.zscores))
mod1.rarefaction
```
> NB here you might get an error if you have missing values. Missing vaules can be dealt with in different ways depending on the situation. Sometimes it is easiest to drop the sample, sometimes you can imput the values of the missing data. The default in the capscale (na.fail) is to stop with missing values. Choices na.omit and na.exclude delete rows with missing values, but differ in representation of results. With na.omit only non-missing site scores are shown, but na.exclude gives NA for scores of missing observations.
Finally do the stepwise selection of variables:
```{r}
ordistep(mod0.rarefaction, scope = formula(mod1.rarefaction), perm.max = 1000, direction="both")
```
*Can you see which variables were selected?*
When doing a stepwise building of models you can this either "forward" (as in the example), "backwards", or "both". Try different methods and see if the end result is any different.
## Plot the ordination
In the following sections we will use `ggord` for more control of the ordination plot. ggord is a packages that makes use of ggplot2. It can take many different parameters. See `?ggord` for details.
```{r}
isa.rarified.db <- dbrda(formula = otu.tab.trans.ss.nozero.bray ~ silicate+temp_25+sal_25+F_25+phosphate, data = as.data.frame(isa.metadata.cont.zscores))
stats::screeplot(isa.rarified.db)
ggord(isa.rarified.db, axes = c("1", "2"))
```
The plot wasn't very pretty. Specifying limits to the plot might help:
```{r}
isa.rarified.db <- dbrda(formula = otu.tab.trans.ss.nozero.bray ~ silicate+temp_25+sal_25+F_25+phosphate, data = as.data.frame(isa.metadata.cont.zscores))
stats::screeplot(isa.rarified.db)
ggord(isa.rarified.db,xlims=c(-1.3,1.3), ylims=c(-1.5,1))
```
Ggord can take several parameters, for instance the seasons from the metadata:
```{r}
ggord(isa.rarified.db, isa.metadata.simp$seasons)
```
*See if you can modify the plot and make the text visible!*
Or the months:
```{r}
ggord(isa.rarified.db, as.factor(isa.metadata.simp$month))
```
And you can specify the limits of the axes if the plot does not look nice:
```{r}
ggord(isa.rarified.db, grp_in=as.factor(isa.metadata.simp$month), xlims=c(-1.3,1.3), ylims=c(-1.5,1.5))
```
<!-- # For those who need more: -->
<!-- ## Constrained Correspondence Analyses (CCA) -->
<!-- - suitable for linear variables -->
<!-- Selection of the most important variables -->
<!-- ```{r} -->
<!-- mod0.cca.rarefaction <- cca(otu.tab.trans.ss.nozero ~ 1, as.data.frame(isa.metadata.cont.zscores)) # model containing only species matrix and intercept -->
<!-- mod1.cca.rarefaction <- cca(otu.tab.trans.ss.nozero ~ ., as.data.frame(isa.metadata.cont.zscores)) # # model including all variables from env matrix (the dot after tilde (~) means ALL!) -->
<!-- ordistep(mod0.cca.rarefaction, scope = formula(mod1.cca.rarefaction), perm.max = 1000, direction="forward") -->
<!-- ``` -->
<!-- Replace the variables in the formula below with the significant variables from the selection procedure. -->
<!-- ```{r} -->
<!-- isa.clr.cca<-cca(formula = otu.tab.trans.ss.nozero ~ chla_GFF+ silicate+ N_P, data = as.data.frame(isa.metadata.cont.zscores)) -->
<!-- stats::screeplot(isa.clr.cca) -->
<!-- ``` -->
<!-- PLotting with some examples of parameters that can be used to control the plot. Use `?ggord` if you want to learn more. -->
<!-- ```{r} -->
<!-- ggord(isa.clr.cca, obslab=T, addsize=0.2,xlims=c(-2.3,2.3), ylims=c(-4,3), -->
<!-- addcol="#2ca25f",txt=3, ext=1.2, labcol="#762a83",veccol="#762a83", size=2.6,alpha=0.7) -->
<!-- ``` -->
<!-- # Permanova -->
<!-- Analysis of variance using distance matrices — for partitioning distance matrices among sources of variation and fitting linear models (e.g., factors, polynomial regression) to distance matrices; uses a permutation test with pseudo-F ratios. -->
<!-- ```{r} -->
<!-- adonis2(otu.tab.trans.ss.nozero~ silicate+temp_25+sal_25+F_25+phosphate, -->
<!-- data = as.data.frame(isa.metadata.cont.zscores), -->
<!-- method="bray", permutations=9999) -->
<!-- ``` -->
Finally save!
```{r}
save.image("AB332_lab_III.RData")
```