forked from krabberod/UNIS_AB332_2024
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathComputer_lab_I.Rmd
286 lines (219 loc) · 10.4 KB
/
Computer_lab_I.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
---
title: "Community ecology - Computer lab I - AB332"
author: "Anders K. Krabberød (UiO) and Ramiro Logares (ICM)"
date: "October 2024"
output:
html_notebook: default
pdf_document: default
---
>This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code.
>Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Cmd+Shift+Enter* (Mac).
>
>Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Cmd+Option+I* (Mac).
>
>When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Cmd+Shift+K* (Mac) to preview the HTML file).
>
>The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
Make sure you have installed all packages!
### Load Packages
```{r,include=FALSE}
library(vegan)
library(tidyverse)
```
# Starting community ecology analyses
Read the data from the github page:
https://github.com/krabberod/UNIS_AB332_2024
```{r}
otu.tab <- read_tsv("https://raw.githubusercontent.com/krabberod/UNIS_AB332_2024/main/computer_lab/data/AB332_otutab_reduc3.txt")
```
First, get to know the data:
- *How many samples and how many OTUs are in the dataset? *
- *What do the numbers in the sample names mean?*
```{r}
head(otu.tab)
dim(otu.tab)
```
You can look at a given selection of the table by specifying a range of rows and columns:
```{r}
otu.tab[5:15, 1:5] # The first 10 rows, and the first 5 columns
```
You can also see the entire table with the `View()` function:
```{#r}
View(otu.tab)
```
- *See if you can choose a different subset. For instance samples 6-12 and OTUs 20-26:*
```{r}
```
Assign OTU-numbers as rownames
```{r}
otu.tab <- column_to_rownames(otu.tab, var = "OTUNumber")
```
Let's check the names
```{r}
head(rownames(otu.tab))
tail(rownames(otu.tab))
dim(otu.tab)
colnames(otu.tab)
```
For simplicity, I have included only 25 samples in the rest of the tutorial. As an exercise, *you could redo the analysis with the full dataset*. I.e. remove the part of the code that selects particular samples in the following chunk.
This way your numbers will differ from the pdf, and you can also see the effect of a different dataset.s
```{r}
otu.tab.red <- otu.tab[, 6:30]
```
The data needs to be transposed since this is how Vegan likes it (i.e. Vegan prefers OTUs as columns, and Samples as rows).
```{r}
otu.tab.trans <- t(otu.tab.red)
otu.tab.trans[1:5, 1:5]
```
You can get the total number of reads for each sample using rowSums().
and the total reads per OTU with colSums().
- *Do you understand what these two functions do just by looking at their names?*
```{r}
rowSums(otu.tab.trans)
head(colSums(otu.tab.trans)) # Too many to show them all.
```
- *Can you figure out a way to view only the last part of the the list of total reads per OTU? Hint: look up the help file for the head() function*:
```{r}
?head
```
Since I have selected only a few of the samples is possible that some of the OTU's are left with a total abundance of zero. In R it is possible to have functions within functions so the following will print the number of columns in the data set that has a sum equal to 0:
```{r}
length(which(colSums(otu.tab.trans) == 0))
```
This code is nested so that R reads evaluates the innermost function first (i.e ```colSums()```), then applies the next function to that result (i.e. ```which (...) ==0```), then finally the outermost function ```lenght()```. In "normal" language the nested function asks: what is the length of the list of column-sums which are exactly zero.
We can use the same idea of a function within a function to exclude the OTUs with a total number of 0. In the next chunk we use the square brackets to make a selection in the dataframe (as before), and add a '-' to get the opposite of what is evaluated by the function.
```{r}
otu.tab.trans <- otu.tab.trans[, -(which(colSums(otu.tab.trans) == 0))]
```
Now how many are 0?
```{r}
length(which(colSums(otu.tab.trans) == 0))
```
How many have more than 0 reads?
```{r}
length(which(colSums(otu.tab.trans) > 0))
```
Can you find how many OTU's that have more than 10 reads (in total)?
```{r}
```
## Common metrics and methods
The following calculations make use of functions in the vegan package written by Jari Oksanen.
*Vegan is an R package for community ecologists. It contains the most popular methods of multivariate analysis needed in analyzing ecological communities, and tools for diversity analysis, and other potentially useful functions*. If you want to learn more about the vegan package you can check out the vignette (a form for introduction) by running: ```browseVignettes("vegan")```
### Rarefaction
Let's calculate the number of reads per sample as rarefaction curves:
```{r}
rowSums(otu.tab.trans)
rarecurve(otu.tab.trans, step = 100, xlab = "Number of reads", ylab = "Richness", col = "blue")
```
How do you interpret these curves? Which samples have the lowest number of total reads? Which are the highest?
### Accumulation curves
```{r}
accum.curve <- specaccum(otu.tab.trans, method = "collector")
plot(accum.curve)
```
What does this curve represent? How do you interpret it?
### Richness estimations
Now lets do some ecology:
"The abundance-based estimates in estimateR use counts (numbers of individuals) of species in a single site. If called for a matrix or data frame, the function will give separate estimates for each site. The two variants of extrapolated richness in estimateR are bias-corrected Chao and Abundance-based Coverage Estimator (ACE) (O'Hara 2005, Chiu et al. 2014)."
```{r}
richness <- estimateR(otu.tab.trans, smallsample = TRUE)
richness
```
If you only want to look at one sample
```{r}
richness <- estimateR(otu.tab.trans[1,], smallsample = TRUE)
richness
```
Above we have the estimators Chao and ACE as well as the species number. What do the numbers mean?
### Evenness
```{r}
plot(colSums(otu.tab.trans), log = "y", xlab = "Rank", ylab = "Abundance", pch = 19, cex = 0.5, col = "blue")
plot(sort(colSums(otu.tab.trans), decreasing = TRUE), log = "y", xlab = "Rank", ylab = "Abundance", pch = 19, cex = 0.5, col = "blue")
```
### Fitting rank-abundance distribution models to the data
Function radfit() constructs rank – abundance or dominance / diversity or Whittaker plots and fit brokenstick, preemption, log-Normal, Zipf and Zipf-Mandelbrot models of species abundance.
```{r}
mod <- radfit(otu.tab.trans)
plot(mod)
```
And for all species
```{r}
mod.all <- radfit(colSums(otu.tab.trans))
plot(mod.all)
```
### Fitting data to the Preston model
```{r}
preston <- prestonfit(colSums(otu.tab.trans))
preston.dist <- prestondistr(colSums(otu.tab.trans))
plot(preston)
lines(preston.dist, line.col = "blue3")
```
### Extrapolated richness
```{r}
veiledspec(preston)
veiledspec(preston.dist)
```
### Shannon H index (considers richness and evenness)
A diversity index taking into account the number of individuals as well as the number of taxa. Varies from 0 for communities with only a single taxon to high values for communities with many taxa, each with few individuals
```{r}
H <- diversity(otu.tab.trans, index = "shannon")
H
plot(H, type = "l", col = "blue")
```
### Pielou's index of evenness (range 0-1, 1 = maximum evenness)
`J=H/Hmax`
`J=Shannon (H) / log(S=species richness)`
A calculated value of Pielou's evenness ranges from 0 (no evenness) to 1 (complete evenness).
```{r}
J <- H / log(rowSums(otu.tab.trans > 0))
```
### Inverse Simpson's D index (richness+evenness. Larger values, larger diversity)
The value of Simpson’s D ranges from 0 to 1, with 0 representing infinite diversity and 1 representing no diversity, so the larger the value of 𝐷 , the lower the diversity. For this reason, Simpson’s index is usually expressed as its inverse
```{r}
inv.simpson <- diversity(otu.tab.trans, "simpson")
plot(inv.simpson, type = "l", col = "blue")
```
# Beta diversity
We rarefy all samples to the same sequencing depth, to reduce biases.
```{r}
min(rowSums(otu.tab.trans)) # We calculate the sample with the minimum amount of reads
otu.tab.trans.ss <- rrarefy(otu.tab.trans, min(rowSums(otu.tab.trans))) # Samples are rarefied to lowest number of reads
rowSums(otu.tab.trans.ss)
head(colSums(otu.tab.trans.ss))
```
What is the number of reads these samples have been rarified to? What does it imply, do you understand how it is done?
Check that the number of OTUs are the same in the new table
```{r}
dim(otu.tab.trans)
dim(otu.tab.trans.ss)
```
The tables have the same size, but, after removing reads, OTUs might be left with zero read abundance. These are typically those with very low abundace to begin with, and (hopefully) do not play an important in the system we are studing.
```{r}
length(which(colSums(otu.tab.trans) == 0))
length(which(colSums(otu.tab.trans.ss) == 0))
head(which(colSums(otu.tab.trans.ss) == 0)) # Show the OTUs and the position in the table that have 0 abundance for the first OTUs
```
- *How many OTUs are empty after rarefaction?*
We can compare the number of reads for a selected OTU (in this case the 13th OTU in the list) between the origianl and the subsampled OTU table:
```{r}
colnames(otu.tab.trans)[13]
otu.tab.trans[, 13] # This gives the abundance of the OTU1009 across the different samples in the table that is NOT subsampled
otu.tab.trans.ss[, 13] # # This gives the abundance of the OTU1009 across the different samples in the table that IS subsampled
```
We can remove the OTUs with zero abundance with a similar command as we used at the beginning of the lab:
```{r}
otu.tab.trans.ss.nozero <- otu.tab.trans.ss[, -(which(colSums(otu.tab.trans.ss) == 0))] # Removes OTUs with zero abundance
length(which(colSums(otu.tab.trans.ss.nozero) == 0)) # Check that no zero abundance OTUs are left
```
Let's check dimensions of the tables:
```{r}
dim(otu.tab.trans.ss)
dim(otu.tab.trans.ss.nozero)
```
-*How many OTUs gave been removed?*
There are other ways to transform and normalise the data, but we will not go into the details in this course.
Here's an example for those interested:
*Phew* That was Part I. Now before you have a break save the data so it can be loaded if you want to use some of the same data for the next session. This way you don't have to redo all analysis for the next lab.
```{r}
save.image("AB332_lab_I.RData")
```