-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathTemporalMicrobiotaTrajectory.Rmd
executable file
·200 lines (161 loc) · 6.36 KB
/
TemporalMicrobiotaTrajectory.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
---
title: "Temporal microbiota trajectory"
author: "Leo Lahti, Sudarshan Shetty et al. "
bibliography:
- bibliography.bib
output:
BiocStyle::html_document:
number_sections: no
toc: yes
toc_depth: 4
toc_float: true
self_contained: true
thumbnails: true
lightbox: true
gallery: true
use_bookdown: false
highlight: haddock
---
<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{microbiome tutorial - composition}
%\usepackage[utf8]{inputenc}
%\VignetteEncoding{UTF-8}
-->
Nowadays, increasing number of microbiota researchers are incorporating temporal aspects in study designs. These may be either long-term evaluation of soil microbiota or intervention studies in human microbiome research. Here, we present some codes and visualition of longitudinal microbiota sampling. We hope to add some functionality in future releases of `microbiome` package.
In this, tutorial we demonstrate how to plot trajactory of a microbiota through time
We use the human microbiome time series data `moving_pictures` from [Caporaso et al., 2011 Genome Biol.](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2011-12-5-r50). These data as the reprocessed files as a part of the [Earth microbiome project](https://www.nature.com/articles/nature24621). This data is packaged in an R package [jeevanuDB](https://github.com/microsud/jeevanuDB) for using it in this tutorial.
```{r lib-1, warning=FALSE, message=FALSE, eval=TRUE}
#### To install this pkg ####
# install.packages("devtools")
# devtools::install_github("microsud/jeevanuDB")
############################
library(microbiome)
library(jeevanuDB) # external database pkg for microbiome pkg with test data
library(dplyr)
library(ggplot2)
library(viridis)
library(knitr)
```
We use the `moving_pictures` dataset.
```{r data-eg, warning=FALSE, message=FALSE, eval=TRUE}
# Example data
data("moving_pictures")
# Rename
ps <- moving_pictures
```
Check data for which and how many samples are present each subject.
```{r sam-data, warning=FALSE, message=FALSE, eval=TRUE}
kable(table(meta(ps)$host_subject_id, meta(ps)$sample_type))
```
Use only the stool (gut) microbiota data.
```{r get-dat, warning=FALSE, message=FALSE, eval=TRUE}
ps.gut <- subset_samples(ps, sample_type == "stool")
taxa_names(ps.gut) <- paste0("ASV-", seq(ntaxa(ps.gut))) # rename sequences to ids
# remove asvs which are zero in all of these samples
ps.gut <- prune_taxa(taxa_sums(ps.gut) > 0, ps.gut)
# remove samples with less than 500 reads Note: this is user choice
# here we just show example
ps.gut <- prune_samples(sample_sums(ps.gut) > 500, ps.gut)
# Covnert to relative abundances
ps.gut.rel <- microbiome::transform(ps.gut, "compositional")
```
Using `phyloseq` we do ordination analysis.
```{r get-ordination, warning=FALSE, message=FALSE, eval=TRUE}
# Ordination object
ps.ord <- ordinate(ps.gut.rel, "PCoA")
# Ordination object plus all metadata, note: we use justDF=T. This will not return a plot but a data.frame
ordip <- plot_ordination(ps.gut.rel, ps.ord, justDF = T)
```
Now, let's start to make a custom visualization.
```{r get-axis-var, warning=FALSE, message=FALSE, eval=TRUE}
# Get axis 1 and 2 variation
evals1 <- round(ps.ord$values$Eigenvalues[1] / sum(ps.ord$values$Eigenvalues) * 100, 2)
evals2 <- round(ps.ord$values$Eigenvalues[2] / sum(ps.ord$values$Eigenvalues) * 100, 2)
```
Set some nice colors
```{r set-col, warning=FALSE, message=FALSE, eval=TRUE}
# theme_set(theme_bw(14))
# set colors
subject_cols <- c(F4 = "#457b9d", M3 = "#e63946")
```
Add trajectory for the subject of interest. Here, we randomnly choose subject F4
```{r subset-subject, warning=FALSE, message=FALSE, eval=TRUE}
# choose data for subject F4
dfs <- subset(ordip, host_subject_id == "F4")
# arrange according to sampling time. Sort with increasing time
dfs <- dfs %>%
arrange(days_since_experiment_start)
```
Initiate plotting
```{r suject-plot, warning=FALSE, message=FALSE, eval=TRUE, fig.align='center', fig.width=6, fig.height=4}
# use the ordip
# first step get blank plot
p <- ggplot(ordip, aes(x = Axis.1, y = Axis.2))
# add path (lines) join only those samples that are from F4
p2 <- p +
geom_path(
data = dfs, alpha = 0.7,
arrow = arrow(
angle = 15, length = unit(0.1, "inches"),
ends = "last", type = "closed"
)
) +
# now add a layer of points
geom_point(aes(color = host_subject_id), alpha = 0.6, size = 3) +
scale_color_manual("Subject", values = subject_cols) +
xlab(paste("PCoA 1 (", evals1, "%)", sep = "")) +
ylab(paste("PCoA 2 (", evals2, "%)", sep = "")) +
theme_bw() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()
)
# coord_fixed(sqrt(evals[2] / evals[1]))
# Print figure
print(p2)
```
Alternatively we can just focus on one subject.
```{r suject-plot-a, warning=FALSE, message=FALSE, eval=TRUE}
# subset data for only M3
ps.gut.rel.m3 <- subset_samples(ps.gut.rel, host_subject_id == "M3")
# remove asvs which are zero in all of these samples
ps.gut.rel.m3 <- prune_taxa(taxa_sums(ps.gut.rel.m3) > 0, ps.gut.rel.m3)
```
```{r suject-plot-b, warning=FALSE, message=FALSE, eval=TRUE}
ps.ord.m3 <- ordinate(ps.gut.rel.m3, "PCoA")
ordip.m3 <- plot_ordination(ps.gut.rel.m3, ps.ord.m3, justDF = T)
# Get axis 1 and 2 variation
evals1 <- round(ordip.m3$values$Eigenvalues[1] / sum(ordip.m3$values$Eigenvalues) * 100, 2)
evals2 <- round(ordip.m3$values$Eigenvalues[2] / sum(ordip.m3$values$Eigenvalues) * 100, 2)
# arrange according to sampling time
ordip.m3 <- ordip.m3 %>%
arrange(days_since_experiment_start) # important to arrange the time
```
Plot data
```{r suject-plot-c, warning=FALSE, message=FALSE, eval=TRUE, fig.align='center', fig.width=6, fig.height=4}
# Visualize
# blank plot initiate
p1 <- ggplot(ordip.m3, aes(x = Axis.1, y = Axis.2))
# add layers
p3 <- p1 +
# add arrows with geom_path
geom_path(alpha = 0.5, arrow = arrow(
angle = 30, length = unit(0.1, "inches"),
ends = "last", type = "closed"
)) +
# add points
geom_point(aes(color = days_since_experiment_start), size = 3) +
# add gradient colors
scale_color_viridis("Days from first sampling") +
# add x and y labels
xlab(paste("PCoA 1 (", evals1, "%)", sep = "")) +
ylab(paste("PCoA 2 (", evals2, "%)", sep = "")) +
theme_bw() +
# remove grids in the plot
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()
)
print(p3)
```