computer_lab/Computer_lab_III.Rmd

---
title: 'Community ecology - Computer lab III - AB332'
author: "Anders K. Krabberød (UiO) and Ramiro Logares (ICM)"
date: "October 2024"
output:
  
  html_notebook: default
  pdf_document: default
---

# Computer lab III

### First load the necessary packages.
```{r,include=FALSE}
library(tidyverse)
library(vegan)
library(corrplot)
library(RcmdrMisc)
library(ggord)
```

Load data from previous lab (if it is not already in the environment)
```{r}
load("AB332_lab_II.RData")
```


### Analyses using environmental variation
The aim is to investigate how the environmental variation may explain community variance.  
**Read table with environmental data**:
```{r}
isa.metadata<-read_tsv("https://raw.githubusercontent.com/krabberod/UNIS_AB332_2024/main/computer_lab/data/AB332metadata_v3.txt")
isa.metadata
isa.metadata <- column_to_rownames(isa.metadata, var = "Sample_Name")
```
Check the variables available in the metadata:
```{r}
colnames(isa.metadata)
```


Select the environmental variables for the same samples as we have been using for the previous labs.
```{r}
isa.metadata.simp <- isa.metadata[6:30,]
isa.metadata.simp
```
Check that the samples are correct, i.e. that the same names are in the OTU-table and the metadata:
```{r}
identical(colnames(otu.tab.red),rownames(isa.metadata.simp))
```


For this exercise we will use a selection of the continuous variables as metadata.  
This line will extract 8 of the variables, based on the column names:  
```{r}
isa.metadata.cont <- isa.metadata.simp %>% dplyr::select("nitrate","phosphate","silicate","N_P","temp_25","sal_25","F_25","chla_GFF")
```

We transform variables using z-scores to have comparable ranges of variation.
```{r}
isa.metadata.cont.zscores <- scale(isa.metadata.cont, center = T, scale = T)
isa.metadata.cont.zscores[1:5,]
```

Let's check if the environmental variables are correlated to each other.
We Calculate correlations and p-values:
```{r}
(env.corr.signif.adjust <- rcorr.adjust(as.matrix(isa.metadata.cont.zscores)))

```

The p-values are corrected for multiple inference using Holm's method (see p.adjust).
More info on: https://en.wikipedia.org/wiki/Multiple_comparisons_problem
Holm corrected values for multiple comparisons
```{r}
env.corr.signif.r <- env.corr.signif.adjust$R$r
env.corr.signif.p <- env.corr.signif.adjust$P
# Edit the object to replace any "<" by "0" using the function "gsub"
env.corr.signif.p <- gsub("<","0", env.corr.signif.p)
# Modify the object to be numeric datatype.
# NB: the transformation is done so the matrix of p values can be read as numeric!
env.corr.signif.p <- apply(env.corr.signif.p, 2 ,as.numeric)
rownames(env.corr.signif.p) <- colnames(env.corr.signif.p)
```

Plot the correlation plot:
```{r}
corrplot(env.corr.signif.r , type="upper", order="hclust", p.mat = env.corr.signif.p, sig.level = 0.05,insig = "pch",hclust.method = c("average"), tl.cex= 0.8, tl.col="black", diag=F)
```
### Fitting environmental variables to ordinations
`envfit` will fit the environmental variables to the NMDS ordination as vectors  
First we fit the variables to the OTU-table that was rarified:  
```{r}

otu.tab.trans.ss.nozero.bray.nmds.envfit <- envfit(otu.tab.trans.ss.nozero.bray.nmds, 
                                            as.data.frame(isa.metadata.cont.zscores),  permu=999, na.rm=T, display ="sites")
otu.tab.trans.ss.nozero.bray.nmds.envfit

```


The two last columns indicate the squared correlation coefficient and the associated p-value
We plot the vectors of the significant correlations


```{r}
plot(otu.tab.trans.ss.nozero.bray.nmds, type="t", display="sites") # plot the samples
plot(otu.tab.trans.ss.nozero.bray.nmds.envfit) # plot all environmental vectors
```
The plotting only the vectors with p<0.01.
```{r}
plot(otu.tab.trans.ss.nozero.bray.nmds, type="t", display="sites") # plot the samples
plot(otu.tab.trans.ss.nozero.bray.nmds.envfit, p.max=0.01)
```
Can you see any difference?  


## Constrained Ordination
Distance-based redundancy analysis (dbRDA) is an ordination method similar to Redundancy Analysis (rda), but it allows non-Euclidean dissimilarity indices, such as Manhattan or Bray–Curtis distance.
Selection of the most important (i.e. signficant) variables for dbRDA is done by comparing a null model to the full model and doing a stepwise selection of significant variables.


Start with a model containing only species matrix and intercept:
```{r}
mod0.rarefaction <- capscale(otu.tab.trans.ss.nozero.bray ~ 1, as.data.frame(isa.metadata.cont.zscores))
mod0.rarefaction
```

Now make a model including all variables from env matrix (the dot after tilde (~) means ALL!)
```{r}
mod1.rarefaction <- capscale(otu.tab.trans.ss.nozero.bray ~ ., as.data.frame(isa.metadata.cont.zscores))
mod1.rarefaction
```
> NB here you might get an error if you have missing values. Missing vaules can be dealt with in different ways depending on the situation. Sometimes it is easiest to drop the sample, sometimes you can imput the values of the missing data. The default in the capscale (na.fail) is to stop with missing values. Choices na.omit and na.exclude delete rows with missing values, but differ in representation of results. With na.omit only non-missing site scores are shown, but na.exclude gives NA for scores of missing observations.

Finally do the stepwise selection of variables:
```{r}
ordistep(mod0.rarefaction, scope = formula(mod1.rarefaction), perm.max = 1000, direction="both")
```
*Can you see which variables were selected?*  
When doing a stepwise building of models you can this either "forward" (as in the example), "backwards", or "both". Try different methods and see if the end result is any different.


## Plot the ordination
In the following sections we will use `ggord` for more control of the ordination plot. ggord is a packages that makes use of ggplot2. It can take many different parameters. See `?ggord` for details.

```{r}
isa.rarified.db <- dbrda(formula = otu.tab.trans.ss.nozero.bray ~ silicate+temp_25+sal_25+F_25+phosphate, data = as.data.frame(isa.metadata.cont.zscores))

stats::screeplot(isa.rarified.db)
ggord(isa.rarified.db, axes = c("1", "2"))
```
The plot wasn't very pretty. Specifying limits to the plot might help:

```{r}
isa.rarified.db <- dbrda(formula = otu.tab.trans.ss.nozero.bray ~ silicate+temp_25+sal_25+F_25+phosphate, data = as.data.frame(isa.metadata.cont.zscores))

stats::screeplot(isa.rarified.db)
ggord(isa.rarified.db,xlims=c(-1.3,1.3), ylims=c(-1.5,1))
```


Ggord can take several parameters, for instance the seasons from the metadata:

```{r}
ggord(isa.rarified.db, isa.metadata.simp$seasons)
```
*See if you can modify the plot and make the text visible!*  
Or the months:

```{r}
ggord(isa.rarified.db, as.factor(isa.metadata.simp$month))
```
And you can specify the limits of the axes if the plot does not look nice:  
```{r}
ggord(isa.rarified.db, grp_in=as.factor(isa.metadata.simp$month), xlims=c(-1.3,1.3), ylims=c(-1.5,1.5))

```

<!-- # For those who need more:  -->
<!-- ## Constrained Correspondence Analyses (CCA) -->
<!-- - suitable for linear variables -->
<!-- Selection of the most important variables -->

<!-- ```{r} -->
<!-- mod0.cca.rarefaction <- cca(otu.tab.trans.ss.nozero ~ 1, as.data.frame(isa.metadata.cont.zscores)) # model containing only species matrix and intercept -->
<!-- mod1.cca.rarefaction <- cca(otu.tab.trans.ss.nozero ~ ., as.data.frame(isa.metadata.cont.zscores)) # # model including all variables from env matrix (the dot after tilde (~) means ALL!) -->
<!-- ordistep(mod0.cca.rarefaction, scope = formula(mod1.cca.rarefaction), perm.max = 1000, direction="forward") -->
<!-- ``` -->

<!-- Replace the variables in the formula below with the significant variables from the selection procedure. -->
<!-- ```{r} -->
<!-- isa.clr.cca<-cca(formula = otu.tab.trans.ss.nozero ~ chla_GFF+ silicate+ N_P, data = as.data.frame(isa.metadata.cont.zscores)) -->

<!-- stats::screeplot(isa.clr.cca) -->
<!-- ``` -->
<!-- PLotting with some examples of parameters that can be used to control the plot. Use `?ggord` if you want to learn more. -->

<!-- ```{r} -->
<!-- ggord(isa.clr.cca, obslab=T, addsize=0.2,xlims=c(-2.3,2.3), ylims=c(-4,3), -->
<!--       addcol="#2ca25f",txt=3, ext=1.2, labcol="#762a83",veccol="#762a83", size=2.6,alpha=0.7) -->
<!-- ``` -->


<!-- # Permanova -->
<!-- Analysis of variance using distance matrices — for partitioning distance matrices among sources of variation and fitting linear models (e.g., factors, polynomial regression) to distance matrices; uses a permutation test with pseudo-F ratios.   -->

<!-- ```{r} -->
<!-- adonis2(otu.tab.trans.ss.nozero~ silicate+temp_25+sal_25+F_25+phosphate, -->
<!--        data = as.data.frame(isa.metadata.cont.zscores), -->
<!--        method="bray", permutations=9999) -->
<!-- ``` -->


Finally save!
```{r}
save.image("AB332_lab_III.RData")
```