computer_lab/Computer_lab_I.Rmd

---
title: "Community ecology - Computer lab I - AB332"
author: "Anders K. Krabberød (UiO) and Ramiro Logares (ICM)"
date: "October 2024"
output:
  html_notebook: default
  pdf_document: default
---

>This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code.
>Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Cmd+Shift+Enter* (Mac).
>
>Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Cmd+Option+I* (Mac).
>
>When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Cmd+Shift+K* (Mac) to preview the HTML file).
>
>The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.


Make sure you have installed all packages!

### Load Packages
```{r,include=FALSE}
library(vegan)
library(tidyverse)
```


# Starting community ecology analyses

Read the data from the github page:
https://github.com/krabberod/UNIS_AB332_2024
```{r}
otu.tab <- read_tsv("https://raw.githubusercontent.com/krabberod/UNIS_AB332_2024/main/computer_lab/data/AB332_otutab_reduc3.txt")
```

First, get to know the data: 
- *How many samples and how many OTUs are in the dataset? *
- *What do the numbers in the sample names mean?*
```{r}
head(otu.tab)
dim(otu.tab)
```

You can look at a given selection of the table by specifying a range of rows and columns:
```{r}
otu.tab[5:15, 1:5] # The first 10 rows, and the first 5 columns
```

You can also see the entire table with the `View()` function:
```{#r}
View(otu.tab)
```

- *See if you can choose a different subset. For instance samples 6-12 and OTUs 20-26:*
```{r}

```

Assign OTU-numbers as rownames
```{r}
otu.tab <- column_to_rownames(otu.tab, var = "OTUNumber")
```

Let's check the names
```{r}
head(rownames(otu.tab))
tail(rownames(otu.tab))
dim(otu.tab)
colnames(otu.tab)
```

For simplicity, I have included only 25 samples in the rest of the tutorial. As an exercise, *you could redo the analysis with the full dataset*. I.e. remove the part of the code that selects particular samples in the following chunk.  
This way your numbers will differ from the pdf, and you can also see the effect of a different dataset.s
```{r}
otu.tab.red <- otu.tab[, 6:30]
```

The data needs to be transposed since this is how Vegan likes it (i.e. Vegan prefers OTUs as columns, and Samples as rows).
```{r}
otu.tab.trans <- t(otu.tab.red)
otu.tab.trans[1:5, 1:5]
```

You can get the total number of reads for each sample using rowSums().
and the total reads per OTU with colSums().  
- *Do you understand what these two functions do just by looking at their names?*

```{r}
rowSums(otu.tab.trans)
head(colSums(otu.tab.trans)) # Too many to show them all.
```
- *Can you figure out a way to view only the last part of the the list of total reads per OTU? Hint: look up the help file for the head() function*:


```{r}
?head
```

Since I have selected only a few of the samples is possible that some of the OTU's are left with a total abundance of zero. In R it is possible to have functions within functions so the following will print the number of columns in the data set that has a sum equal to 0:

```{r}
length(which(colSums(otu.tab.trans) == 0))
```
This code is nested so that R reads evaluates the innermost function first (i.e ```colSums()```), then applies the next function to that result (i.e. ```which (...) ==0```), then finally the outermost function ```lenght()```. In "normal" language the nested function asks: what is the length of the list of column-sums which are exactly zero. 


We can use the same idea of a function within a function to exclude the OTUs with a total number of 0. In the next chunk we use the square brackets to make a selection in the dataframe (as before), and add a '-' to get the opposite of what is evaluated by the function. 
```{r}
otu.tab.trans <- otu.tab.trans[, -(which(colSums(otu.tab.trans) == 0))]
```

Now how many are 0?
```{r}
length(which(colSums(otu.tab.trans) == 0))
```

How many have more than 0 reads?
```{r}
length(which(colSums(otu.tab.trans) > 0))
```

Can you find how many OTU's that have more than 10 reads (in total)?
```{r}

```

## Common metrics and methods

The following calculations make use of functions in the vegan package written by Jari Oksanen.  
*Vegan is an R package for community ecologists. It contains the most popular methods of multivariate analysis needed in analyzing ecological communities, and tools for diversity analysis, and other potentially useful functions*. If you want to learn more about the vegan package you can check out the vignette (a form for introduction) by running: ```browseVignettes("vegan")```


### Rarefaction

Let's calculate the number of reads per sample as rarefaction curves:
```{r}
rowSums(otu.tab.trans)
rarecurve(otu.tab.trans, step = 100, xlab = "Number of reads", ylab = "Richness", col = "blue")
```
How do you interpret these curves? Which samples have the lowest number of total reads? Which are the highest?

### Accumulation curves
```{r}
accum.curve <- specaccum(otu.tab.trans, method = "collector")
plot(accum.curve)
```
What does this curve represent? How do you interpret it?

### Richness estimations
Now lets do some ecology: 

"The abundance-based estimates in estimateR use counts (numbers of individuals) of species in a single site. If called for a matrix or data frame, the function will give separate estimates for each site. The two variants of extrapolated richness in estimateR are bias-corrected Chao and Abundance-based Coverage Estimator (ACE) (O'Hara 2005, Chiu et al. 2014)." 
```{r}
richness <- estimateR(otu.tab.trans, smallsample = TRUE)
richness
```
If you only want to look at one sample
```{r}
richness <- estimateR(otu.tab.trans[1,], smallsample = TRUE)
richness
```


Above we have the estimators Chao and ACE as well as the species number. What do the numbers mean?


### Evenness
```{r}
plot(colSums(otu.tab.trans), log = "y", xlab = "Rank", ylab = "Abundance", pch = 19, cex = 0.5, col = "blue")
plot(sort(colSums(otu.tab.trans), decreasing = TRUE), log = "y", xlab = "Rank", ylab = "Abundance", pch = 19, cex = 0.5, col = "blue")
```

### Fitting rank-abundance distribution models to the data

Function radfit() constructs rank – abundance or dominance / diversity or Whittaker plots and fit brokenstick, preemption, log-Normal, Zipf and Zipf-Mandelbrot models of species abundance.

```{r}
mod <- radfit(otu.tab.trans)
plot(mod)
```

And for all species 
```{r}
mod.all <- radfit(colSums(otu.tab.trans))
plot(mod.all)
```

### Fitting data to the Preston model
```{r}
preston <- prestonfit(colSums(otu.tab.trans))
preston.dist <- prestondistr(colSums(otu.tab.trans))
plot(preston)
lines(preston.dist, line.col = "blue3")
```

### Extrapolated richness
```{r}
veiledspec(preston)
veiledspec(preston.dist)
```

### Shannon H index (considers richness and evenness)
A diversity index taking into account the number of individuals as well as the number of taxa. Varies from 0 for communities with only a single taxon to high values for communities with many taxa, each with few individuals

```{r}
H <- diversity(otu.tab.trans, index = "shannon")
H
plot(H, type = "l", col = "blue")
```

### Pielou's index of evenness (range 0-1, 1 = maximum evenness)

`J=H/Hmax`  
`J=Shannon (H) / log(S=species richness)`
A calculated value of Pielou's evenness ranges from 0 (no evenness) to 1 (complete evenness).
  
```{r}
J <- H / log(rowSums(otu.tab.trans > 0))
```

### Inverse Simpson's D index (richness+evenness. Larger values, larger diversity)
The value of Simpson’s D ranges from 0 to 1, with 0 representing infinite diversity and 1 representing no diversity, so the larger the value of  𝐷 , the lower the diversity. For this reason, Simpson’s index is usually expressed as its inverse

```{r}
inv.simpson <- diversity(otu.tab.trans, "simpson")
plot(inv.simpson, type = "l", col = "blue")
```

# Beta diversity

We rarefy all samples to the same sequencing depth, to reduce biases.
```{r}
min(rowSums(otu.tab.trans)) # We calculate the sample with the minimum amount of reads

otu.tab.trans.ss <- rrarefy(otu.tab.trans, min(rowSums(otu.tab.trans))) # Samples are rarefied to lowest number of reads
rowSums(otu.tab.trans.ss)
head(colSums(otu.tab.trans.ss))
```
What is the number of reads these samples have been rarified to? What does it imply, do you understand how it is done?

Check that the number of OTUs are the same in the new table
```{r}
dim(otu.tab.trans)
dim(otu.tab.trans.ss)
```

The tables have the same size, but, after removing reads, OTUs might be left with zero read abundance. These are typically those with very low abundace to begin with, and (hopefully) do not play an important in the system we are studing. 
```{r}
length(which(colSums(otu.tab.trans) == 0))
length(which(colSums(otu.tab.trans.ss) == 0))
head(which(colSums(otu.tab.trans.ss) == 0)) # Show the OTUs and the position in the table that have 0 abundance for the first OTUs
```
- *How many OTUs are empty after rarefaction?*  

We can compare the number of reads for a selected OTU (in this case the 13th OTU in the list) between the origianl and the subsampled OTU table:
```{r}
colnames(otu.tab.trans)[13]
otu.tab.trans[, 13] # This gives the abundance of the OTU1009  across the different samples in the table that is NOT subsampled
otu.tab.trans.ss[, 13] # # This gives the abundance of the OTU1009  across the different samples in the table that IS subsampled
```

We can remove the OTUs with zero abundance with a similar command as we used at the beginning of the lab:
```{r}
otu.tab.trans.ss.nozero <- otu.tab.trans.ss[, -(which(colSums(otu.tab.trans.ss) == 0))] # Removes OTUs with zero abundance
length(which(colSums(otu.tab.trans.ss.nozero) == 0)) # Check that no zero abundance OTUs are left
```

Let's check dimensions of the tables:
```{r}
dim(otu.tab.trans.ss)
dim(otu.tab.trans.ss.nozero)
```

-*How many OTUs gave been removed?*  
There are other ways to transform and normalise the data, but we will not go into the details in this course. 
Here's an example for those interested: 

 
 *Phew* That was Part I. Now before you have a break save the data so it can be loaded if you want to use some of the same data for the next session. This way you don't have to redo all analysis for the next lab.
 
```{r}
save.image("AB332_lab_I.RData")
```