diff --git a/DESCRIPTION b/DESCRIPTION index 1d2a99e..7473423 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,28 +1,28 @@ Package: miaTime Type: Package Title: Microbiome Time Series Analysis -Version: 0.1.23 +Version: 0.99.0 Authors@R: c(person(given = "Leo", family = "Lahti", role = c("aut"), email = "leo.lahti@iki.fi", comment = c(ORCID = "0000-0001-5537-637X")), - person(given = "Yagmur", family = "Simsek", role = c("aut", "cre"), + person(given = "Tuomas", family = "Borman", role = c("aut", "cre"), + email = "tuomas.v.borman@utu.fi", + comment = c(ORCID = "0000-0002-8563-8884")), + person(given = "Yagmur", family = "Simsek", role = c("aut"), email = "yagmur.simsek@hsrw.org"), person(given = "Sudarshan", family = "Shetty", role = c("ctb"), email = "sudarshanshetty9@gmail.com"), - person(given = "Chouaib", family = "Benchraka", role=c("ctb")), - person(given = "Tuomas", family = "Borman", role = c("ctb"), - email = "tuomas.v.borman@utu.fi", - comment = c(ORCID = "0000-0002-8563-8884")), - person(given = "Muluh", family = "Muluh", role=c("ctb")) + person(given = "Chouaib", family = "Benchraka", role = c("ctb")), + person(given = "Muluh", family = "Muluh", role = c("ctb")) ) -Description: +Description: The `miaTime` package provides tools for microbiome time series analysis based on (Tree)SummarizedExperiment infrastructure. -biocViews: Microbiome, Software, Sequencing, Coverage +biocViews: Microbiome, Software, Sequencing License: Artistic-2.0 | file LICENSE Depends: - R (>= 4.0), + R (>= 4.4.0), mia Imports: dplyr, @@ -31,7 +31,7 @@ Imports: SingleCellExperiment, SummarizedExperiment, TreeSummarizedExperiment -Suggests: +Suggests: BiocStyle, devtools, ggplot2, @@ -41,17 +41,10 @@ Suggests: rmarkdown, scater, testthat, - tidySingleCellExperiment, - tidySummarizedExperiment, - TreeSummarizedExperiment, - vegan, - zoo -Remotes: - github::microbiome/mia -Encoding: UTF-8 + vegan URL: https://github.com/microbiome/miaTime +BugReports: https://github.com/microbiome/miaTime/issues +Encoding: UTF-8 Roxygen: list(markdown = TRUE) RoxygenNote: 7.3.2 VignetteBuilder: knitr -LazyData: false -Config/testthat/edition: 3 diff --git a/NAMESPACE b/NAMESPACE index eec4d77..1ec2e56 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -8,6 +8,8 @@ exportMethods(addBaselineDivergence) exportMethods(addStepwiseDivergence) exportMethods(getBaselineDivergence) exportMethods(getStepwiseDivergence) +import(TreeSummarizedExperiment) +import(mia) importFrom(dplyr,arrange) importFrom(dplyr,group_by) importFrom(dplyr,lag) diff --git a/NEWS b/NEWS index 70f0d07..977403d 100755 --- a/NEWS +++ b/NEWS @@ -1,3 +1,8 @@ +Changes in version 0.99.0 ++ Take into account Bioconductor requirements + +Changes in 0.1.23 ++ Restructure *BaselineDivergence and *StepwiseDivergence Changes in version 0.1.19 + changed assay_name to assay.type diff --git a/R/AllGenerics.R b/R/AllGenerics.R new file mode 100644 index 0000000..55b5d16 --- /dev/null +++ b/R/AllGenerics.R @@ -0,0 +1,22 @@ +# All generic methods are listed here + +#' @rdname addBaselineDivergence +#' @export +setGeneric("getBaselineDivergence", signature = "x", function(x, ...) + standardGeneric("getBaselineDivergence")) + +#' @rdname addBaselineDivergence +#' @export +setGeneric("addBaselineDivergence", signature = "x", function(x, ...) + standardGeneric("addBaselineDivergence")) + +#' @rdname addStepwiseDivergence +#' @export +#' +setGeneric("getStepwiseDivergence", signature = c("x"), function(x, ...) + standardGeneric("getStepwiseDivergence")) + +#' @rdname addStepwiseDivergence +#' @export +setGeneric("addStepwiseDivergence", signature = "x", function(x, ...) + standardGeneric("addStepwiseDivergence")) diff --git a/R/getBaselineDivergence.R b/R/getBaselineDivergence.R index 0307e6a..cb3f9fb 100644 --- a/R/getBaselineDivergence.R +++ b/R/getBaselineDivergence.R @@ -96,11 +96,6 @@ #' NULL -#' @rdname addBaselineDivergence -#' @export -setGeneric("getBaselineDivergence", signature = "x", function(x, ...) - standardGeneric("getBaselineDivergence")) - #' @rdname addBaselineDivergence #' @export setMethod("getBaselineDivergence", signature = c(x = "SummarizedExperiment"), @@ -163,11 +158,6 @@ setMethod("getBaselineDivergence", signature = c(x = "SummarizedExperiment"), } ) -#' @rdname addBaselineDivergence -#' @export -setGeneric("addBaselineDivergence", signature = "x", function(x, ...) - standardGeneric("addBaselineDivergence")) - #' @rdname addBaselineDivergence #' @export setMethod("addBaselineDivergence", signature = c(x = "SummarizedExperiment"), diff --git a/R/getStepwiseDivergence.R b/R/getStepwiseDivergence.R index adf9c95..6d4bbad 100644 --- a/R/getStepwiseDivergence.R +++ b/R/getStepwiseDivergence.R @@ -50,12 +50,6 @@ #' NULL -#' @rdname addStepwiseDivergence -#' @export -#' -setGeneric("getStepwiseDivergence", signature = c("x"), function(x, ...) - standardGeneric("getStepwiseDivergence")) - #' @rdname addStepwiseDivergence #' @export setMethod("getStepwiseDivergence", signature = c(x = "ANY"), @@ -113,11 +107,6 @@ setMethod("getStepwiseDivergence", signature = c(x = "ANY"), } ) -#' @rdname addStepwiseDivergence -#' @export -setGeneric("addStepwiseDivergence", signature = "x", function(x, ...) - standardGeneric("addStepwiseDivergence")) - #' @rdname addStepwiseDivergence #' @export setMethod("addStepwiseDivergence", signature = c(x = "SummarizedExperiment"), diff --git a/R/data.R b/R/miaTime.R similarity index 94% rename from R/data.R rename to R/miaTime.R index fa7df16..2a0b54c 100755 --- a/R/data.R +++ b/R/miaTime.R @@ -1,3 +1,16 @@ +#' \code{miaTime} Package. +#' +#' \code{miaTime} implements time series methods in \link[mia]{mia} ecosystem. +#' @name miaTime-package +#' @aliases miaTime +#' @seealso \link[TreeSummarizedExperiment:TreeSummarizedExperiment-class]{TreeSummarizedExperiment} class +"_PACKAGE" +NULL + +#' @import mia +#' @import TreeSummarizedExperiment +NULL + #' @title HITChip Atlas with 1006 Western Adults #' @description This data set contains genus-level microbiota profiling with #' HITChip for 1006 western adults with no reported health complications, diff --git a/README.md b/README.md index 9defc3d..ff3cd67 100644 --- a/README.md +++ b/README.md @@ -2,14 +2,17 @@ -[![R-CMD-check-bioc-devel](https://github.com/microbiome/miaTime/workflows/R-CMD-check-bioc-devel/badge.svg)](https://github.com/microbiome/miaTime/actions) - -[![Codecov test -coverage](https://codecov.io/gh/microbiome/miaTime/branch/master/graph/badge.svg)](https://codecov.io/gh/microbiome/miaTime?branch=master) +[![Platforms](http://bioconductor.org/shields/availability/release/miaTime.svg)](https://bioconductor.org/packages/release/bioc/html/miaTime.html) +[![rworkflows](https://github.com/microbiome/miaTime/actions/workflows/rworkflows.yml/badge.svg?branch=devel)](https://github.com/microbiome/miaTime/actions) +[![Bioc-release](http://bioconductor.org/shields/build/release/bioc/miaTime.svg)](http://bioconductor.org/packages/release/bioc/html/miaTime.html) +[![Bioc-age](http://bioconductor.org/shields/years-in-bioc/miaTime.svg)](https://bioconductor.org/packages/release/bioc/html/miaTime.html) +[![Codecov test coverage](https://codecov.io/gh/microbiome/miaTime/branch/devel/graph/badge.svg)](https://codecov.io/gh/microbiome/miaTime?branch=devel) +[![Dependencies](http://bioconductor.org//shields/dependencies/release/miaTime.svg)](https://bioconductor.org/packages/release/bioc/html/miaTime.html) -# miaTime +## Using the package + This R package can be used to analyse time series data for microbial communities. The package is part of [miaverse](https://microbiome.github.io/), and is based on the `TreeSummarizedExperiment` data container. @@ -18,16 +21,35 @@ See the [package homepage](https://microbiome.github.io/miaTime) for example workflows. ## Installation - -The package can be directly installed from R command line. +### Bioc-release -```r -devtools::install_github("microbiome/miaTime") -library(miaTime) ``` +if (!requireNamespace("BiocManager", quietly = TRUE)) + install.packages("BiocManager") + +BiocManager::install("miaTime") +``` + +### Bioc-devel -### Contributions and acknowledgments +``` +if (!requireNamespace("BiocManager", quietly = TRUE)) + install.packages("BiocManager") + +# The following initializes usage of Bioc devel +BiocManager::install(version='devel') + +BiocManager::install("miaTime") +``` + +### GitHub + +``` +remotes::install_github("microbiome/miaTime") +``` + +## Contributions and acknowledgments You can find us online from [Gitter](https://gitter.im/microbiome/miaverse). @@ -43,22 +65,8 @@ flow kind of approach. Development version should be done against the citation("miaTime") ``` -``` -## Kindly cite the miaTime R package as follows: -## -## (C) Yagmur Simsek and Leo Lahti. miaTime R package Version 0.1.21 -## Package URL: microbiome.github.io/miaTime -## -## A BibTeX entry for LaTeX users is -## -## @Misc{, -## title = {miaTime R package}, -## author = {Yagmur Simsek and Leo Lahti}, -## url = {microbiome.github.io/miaTime}, -## note = {Version 0.1.21}, -## } -``` - -# Code of conduct +## Code of conduct -The project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms. +The project is released with a +[Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). +By contributing to this project, you agree to abide by its terms. diff --git a/man/SilvermanAGutData.Rd b/man/SilvermanAGutData.Rd index 4c449d7..0c95473 100644 --- a/man/SilvermanAGutData.Rd +++ b/man/SilvermanAGutData.Rd @@ -1,5 +1,5 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/data.R +% Please edit documentation in R/miaTime.R \docType{data} \name{SilvermanAGutData} \alias{SilvermanAGutData} diff --git a/man/addBaselineDivergence.Rd b/man/addBaselineDivergence.Rd index b86ae1c..48a03fd 100644 --- a/man/addBaselineDivergence.Rd +++ b/man/addBaselineDivergence.Rd @@ -1,14 +1,16 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/getBaselineDivergence.R -\name{addBaselineDivergence} -\alias{addBaselineDivergence} +% Please edit documentation in R/AllGenerics.R, R/getBaselineDivergence.R +\name{getBaselineDivergence} \alias{getBaselineDivergence} +\alias{addBaselineDivergence} \alias{getBaselineDivergence,SummarizedExperiment-method} \alias{addBaselineDivergence,SummarizedExperiment-method} \title{Beta diversity between the baseline and later time steps} \usage{ getBaselineDivergence(x, ...) +addBaselineDivergence(x, ...) + \S4method{getBaselineDivergence}{SummarizedExperiment}( x, time.col, @@ -19,8 +21,6 @@ getBaselineDivergence(x, ...) ... ) -addBaselineDivergence(x, ...) - \S4method{addBaselineDivergence}{SummarizedExperiment}(x, name = "divergence", name.time = "time_diff", ...) } \arguments{ diff --git a/man/addStepwiseDivergence.Rd b/man/addStepwiseDivergence.Rd index cd47736..d8ddd9c 100644 --- a/man/addStepwiseDivergence.Rd +++ b/man/addStepwiseDivergence.Rd @@ -1,14 +1,16 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/getStepwiseDivergence.R -\name{addStepwiseDivergence} -\alias{addStepwiseDivergence} +% Please edit documentation in R/AllGenerics.R, R/getStepwiseDivergence.R +\name{getStepwiseDivergence} \alias{getStepwiseDivergence} +\alias{addStepwiseDivergence} \alias{getStepwiseDivergence,ANY-method} \alias{addStepwiseDivergence,SummarizedExperiment-method} \title{Beta diversity between consecutive time steps} \usage{ getStepwiseDivergence(x, ...) +addStepwiseDivergence(x, ...) + \S4method{getStepwiseDivergence}{ANY}( x, time.col, @@ -19,8 +21,6 @@ getStepwiseDivergence(x, ...) ... ) -addStepwiseDivergence(x, ...) - \S4method{addStepwiseDivergence}{SummarizedExperiment}(x, name = "divergence", name.time = "time_diff", ...) } \arguments{ diff --git a/man/hitchip1006.Rd b/man/hitchip1006.Rd index f77a34e..255de46 100755 --- a/man/hitchip1006.Rd +++ b/man/hitchip1006.Rd @@ -1,5 +1,5 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/data.R +% Please edit documentation in R/miaTime.R \docType{data} \name{hitchip1006} \alias{hitchip1006} diff --git a/man/miaTime-package.Rd b/man/miaTime-package.Rd new file mode 100644 index 0000000..41e150e --- /dev/null +++ b/man/miaTime-package.Rd @@ -0,0 +1,30 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/miaTime.R +\docType{package} +\name{miaTime-package} +\alias{miaTime-package} +\alias{miaTime} +\title{\code{miaTime} Package.} +\description{ +\code{miaTime} implements time series methods in \link[mia]{mia} ecosystem. +} +\seealso{ +\link[TreeSummarizedExperiment:TreeSummarizedExperiment-class]{TreeSummarizedExperiment} class +} +\author{ +\strong{Maintainer}: Yagmur Simsek \email{yagmur.simsek@hsrw.org} + +Authors: +\itemize{ + \item Leo Lahti \email{leo.lahti@iki.fi} (\href{https://orcid.org/0000-0001-5537-637X}{ORCID}) +} + +Other contributors: +\itemize{ + \item Sudarshan Shetty \email{sudarshanshetty9@gmail.com} [contributor] + \item Chouaib Benchraka [contributor] + \item Tuomas Borman \email{tuomas.v.borman@utu.fi} (\href{https://orcid.org/0000-0002-8563-8884}{ORCID}) [contributor] + \item Muluh Muluh [contributor] +} + +} diff --git a/man/minimalgut.Rd b/man/minimalgut.Rd index 41ece43..37d0214 100644 --- a/man/minimalgut.Rd +++ b/man/minimalgut.Rd @@ -1,5 +1,5 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/data.R +% Please edit documentation in R/miaTime.R \docType{data} \name{minimalgut} \alias{minimalgut} diff --git a/man/temporalMicrobiome20.Rd b/man/temporalMicrobiome20.Rd index 33bd078..30475a3 100644 --- a/man/temporalMicrobiome20.Rd +++ b/man/temporalMicrobiome20.Rd @@ -1,5 +1,5 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/data.R +% Please edit documentation in R/miaTime.R \docType{data} \name{temporalMicrobiome20} \alias{temporalMicrobiome20} diff --git a/vignettes/articles/manipulation.Rmd b/vignettes/articles/manipulation.Rmd index abee500..0976d7b 100644 --- a/vignettes/articles/manipulation.Rmd +++ b/vignettes/articles/manipulation.Rmd @@ -1,11 +1,6 @@ --- title: "Time series manipulation" date: "`r Sys.Date()`" -author: -- name: Leo Lahti - email: leo.lahti@iki.fi -- name: Yagmur Simsek - email: yagmur.simsek@hsrw.org package: miaTime output: @@ -22,134 +17,144 @@ vignette: > bibliography: references.bib --- -```{r eval=FALSE, include=FALSE} -library(Cairo) -knitr::opts_chunk$set(cache = FALSE, - fig.width = 9, - dpi=300, - dev = "png", - dev.args = list(type = "cairo-png"), - message = FALSE, - warning = FALSE) -``` - -```{r loadhide, message=FALSE, warning=FALSE, echo=FALSE} -library(mia) -library(miaTime) -library(dplyr) -library(lubridate) -library(SummarizedExperiment) +```{r} +#| label: setup +#| include: false + +library(knitr) +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + cache = TRUE +) ``` -# Introduction +## Introduction -`miaTime` implements tools for time series manipulation based on the `TreeSummarizedExperiment` [@TSE] data container. Much of the functionality is also applicable to the `SummarizedExperiment` [@SE] data objects. This tutorial shows how to use `miaTime` methods as well as the broader R/Bioconductor ecosystem to manipulate time series data. +`miaTime` implements tools for time series manipulation based on the +`r BiocStyle::Biocpkg("TreeSummarizedExperiment")` (`TreeSE`) data container. +Much of the functionality is also applicable to the +`r BiocStyle::Biocpkg("SummarizedExperiment")` data objects. This tutorial +shows how to use `miaTime` methods as well as the broader R/Bioconductor +ecosystem to manipulate time series data. -Check also the related package [TimeSeriesExperiment](https://github.com/nlhuong/TimeSeriesExperiment). +Check also the related package +[TimeSeriesExperiment](https://github.com/nlhuong/TimeSeriesExperiment). ## Installation -Installing the latest development version in R. +`miaTime` is hosted on Bioconductor, and can be installed using via +`BiocManager`. + +```{r} +#| label: install +#| eval: false -```{r, eval=FALSE} -library(devtools) -devtools::install_github("microbiome/miaTime") +BiocManager::install("miaTime") ``` -Loading the package: +Once installed, `miaTime` is made available in the usual way. + +```{r} +#| label: load_package -```{r load-packages, message=FALSE, warning=FALSE} library(miaTime) ``` - ## Sorting samples -The _tidySingleCellExperiment_ package provides handy functions for (Tree)SE manipulation. +To sort data based on subject and time point in base R, you can use the +`order()` function. + +```{r} +#| label: sort_samples -```{r sort samples, message=FALSE, warning=FALSE} -library("tidySingleCellExperiment") data(hitchip1006) tse <- hitchip1006 -tse2 <- tse %>% arrange(subject, time) -``` +index <- order(tse[["subject"]], tse[["time"]]) +tse <- tse[ , index] +``` -## Storing time information with Period class +## Storing time information with `period` class `miaTime` utilizes the functions available in the package `lubridate` -to convert time series field to "Period" class object. This gives access to a -number of readily available [time series manipulation tools](https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html). +to convert time series field to `period` class object. This gives access to a +number of readily available +[time series manipulation tools](https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html). Load example data: ```{r} +#| label: lubridate + # Load packages -library(miaTime) library(lubridate) -library(SummarizedExperiment) - -# Load demo data -data(hitchip1006) -tse <- hitchip1006 # Time is given in days in the demo data. # Convert days to seconds -time_in_seconds <- 60*60*24*colData(tse)[,"time"] +time_in_seconds <- 60*60*24*tse[["time"]] # Convert the time data to period class -Seconds <- as.period(time_in_seconds, unit="sec") +seconds <- as.period(time_in_seconds, unit = "sec") # Check the output -Seconds[1140:1151] +seconds |> tail() ``` - ## Conversion between time units The time field in days is now shown in seconds. It can then be -converted to many different units using the lubridate package. +converted to many different units using the `lubridate` package. ```{r} -Hours <- as.period(Seconds, unit = "hour") -Hours[1140:1151] +#| label: hours + +hours <- as.period(seconds, unit = "hour") +hours |> tail() ``` The updated time information can then be added to the `SummarizedExperiment` data object as a new `colData` (sample data) field. - ```{r} -colData(tse)$timeSec <- Seconds +#| label: add_seconds + +colData(tse)$time_sec <- seconds colData(tse) ``` ## Calculating time differences -The \link[lubridate]{as.duration} function helps to specify time points as duration. +The `lubridate::as.duration()` function helps to specify time points as +duration. ```{r} -Duration <- as.duration(Seconds) -Duration[1140:1151] +#| label: duration + +duration <- as.duration(seconds) +duration |> tail() ``` The difference between subsequent time points can then be calculated. ```{r} -Timediff <- diff(Duration) -Timediff <- c(NA, Timediff) -Timediff[1140:1151] +#| label: time_diff + +time_diff <- diff(duration) +time_diff <- c(NA, time_diff) +time_diff |> tail() ``` The time difference from a selected point to the other time points can be calculated as follows. ```{r} -base <- Hours - Hours[1] #distance from starting point -base[1140:1151] +#| label: time_diff2 -base_1140 <- Seconds - Seconds[1140] -base_1140[1140:1151] +# Difference from second time point +time_diff <- hours - sort(unique(hours))[[2]] +time_diff |> tail() ``` ## Time point rank @@ -157,52 +162,47 @@ base_1140[1140:1151] Rank of the time points can be calculated by `rank` function provided in base R. ```{r} -colData(tse)$rank <- rank(colData(tse)$time) +#| label: rank + +tse[["time"]] <- rank(tse[["time"]]) colData(tse) ``` - ## Operations per unit -Sometimes we need to operate on time series per unit (subject, -reaction chamber, sampling location, ...). +Sometimes we need to operate on time series per unit (subject, reaction chamber, +sampling location, ...). Add time point rank per subject. ```{r} +#| label: rank_for_subject + library(dplyr) -colData(tse) <- colData(tse) %>% - as.data.frame() %>% - group_by(subject) %>% - mutate(rank = rank(time, ties.method="average")) %>% + +colData(tse) <- colData(tse) |> + as.data.frame() |> + group_by(subject) |> + mutate(rank = rank(time, ties.method = "average")) |> DataFrame() ``` - ## Subset to baseline samples -```{r} -library(tidySummarizedExperiment) - -# Pick samples with time point 0 -tse <- hitchip1006 |> filter(time == 0) -# Or: tse <- tse[, tse$time==0] +`TreeSE` consists of rows for features and columns for samples. If we are +specifically interested in baseline samples, we can easily subset the data as +follows. -# Sample with the smallest time point within each subject -colData(tse) <- colData(tse) %>% - as.data.frame() %>% - group_by(subject) %>% - mutate(rank = rank(time, ties.method="average")) %>% - DataFrame +```{r} +#| label: subset -# Pick the subset including first time point per subject -tse1 <- tse[, tse$rank == 1] +tse <- tse[, tse$time==0] ``` - - # Session info ```{r} +#| label: session_info + sessionInfo() ``` diff --git a/vignettes/articles/minimalgut.Rmd b/vignettes/articles/minimalgut.Rmd index 9e99cde..bee4822 100644 --- a/vignettes/articles/minimalgut.Rmd +++ b/vignettes/articles/minimalgut.Rmd @@ -1,9 +1,6 @@ --- title: "Minimal gut bioreactor examples" date: "`r Sys.Date()`" -author: -- name: Sudarshan Shetty -- name: Leo Lahti package: miaTime output: @@ -20,58 +17,49 @@ vignette: > bibliography: references.bib --- -```{r eval=FALSE, include=FALSE} -library(Cairo) -knitr::opts_chunk$set(cache = FALSE, - fig.width = 9, - dpi=300, - dev = "png", - dev.args = list(type = "cairo-png"), - message = FALSE, - warning = FALSE) -``` +```{r} +#| label: setup +#| include: false -```{r loadhide, message=FALSE, warning=FALSE, echo=FALSE} -library(mia) -library(miaTime) -library(dplyr) -library(lubridate) library(knitr) -library(TreeSummarizedExperiment) +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + cache = TRUE +) ``` - # Minimal gut microbiome -Dense samples of the minimal gut microbiome. In the initial hours, MDb-MM was grown under batch condition and 24 h onwards, continuous feeding of media with pulse feeding cycles. This information is stored in the `colData`. +Dense samples of the minimal gut microbiome. In the initial hours, MDb-MM was +grown under batch condition and 24 h onwards, continuous feeding of media with +pulse feeding cycles. This information is stored in the `colData`. + +```{r} +#| label: sample_table -```{r fig.width=8, fig.height=3} library(miaTime) + data(minimalgut) tse <- minimalgut -# quick check of number of samples -kable(table(colData(tse)$StudyIdentifier,colData(tse)$condition_1)) +# Quick check of number of samples +table(tse[["StudyIdentifier"]], tse[["condition_1"]]) ``` Visualize samples available for each of the bioreactors. This allows to identify if there are any missing samples for specific times. - ```{r} +#| label: show_timepoints + library(ggplot2) -library(tidySummarizedExperiment) -tse |> - ggplot(aes(as.factor(Time.hr), StudyIdentifier)) + - geom_tile(aes(fill=condition_1), color="white") + - scale_fill_manual("Condition Sampled", - values = c("#ff006e", "#e07a5f", "#457b9d")) + - theme_minimal() + - theme(axis.text.x = element_text(size=8, angle = 90), - legend.position = "top") + - labs(x="Time (h)", y="") -``` +colData(tse) |> + ggplot() + + geom_tile( + aes(x = as.factor(Time.hr), y = StudyIdentifier, fill = condition_1)) +``` ## Community dynamics @@ -81,229 +69,150 @@ can investigate the succession of mdbMM16 from the start of experiment here hour zero until the end of the experiment. ```{r} -## Divergence from baseline i.e from hour zero. -tse <- transformAssay(minimalgut, method = "relabundance") # get relative abundance -tse <- addBaselineDivergence(tse, - group = "StudyIdentifier", - time.col = "Time.hr", - name = "divergence_from_baseline", - name.time = "time_from_baseline", - assay.type = "relabundance", - method = "bray") - +#| label: calculate_divergence + +# Transform data to relativeS +tse <- transformAssay(tse, method = "relabundance") +# Divergence from baseline i.e from hour zero +tse <- addBaselineDivergence( + tse, + assay.type = "relabundance", + method = "bray", + group = "StudyIdentifier", + time.col = "Time.hr", + ) ``` +Let's then visualize the divergence. + +```{r} +#| label: show_divergence + +library(scater) -Visualize the divergence - -```{r fig.height=4, fig.width=8} -# First define nice colors for bioreactors -bioreac_cols <- c(`Bioreactor A`= "#b2182b", - `Bioreactor B`= "#2166ac", - `Bioreactor C` = "#35978f") - -tse |> - ggplot(aes(x=Time.hr, y=divergence_from_baseline))+ - geom_point(aes(color=StudyIdentifier), size=2, alpha=0.5) + - geom_line(aes(color=StudyIdentifier)) + - theme_minimal() + - scale_color_manual(values = bioreac_cols) + - labs(x="Time (h)", y="Divergence \nfrom baseline") + - # highlight specific timepoints - geom_vline(xintercept = 152, lty=2, color="#991720") + - geom_vline(xintercept = 248, lty=2, color= "#0963bd")+ - annotate("text",x=c(152, 248),y=c(0.8, 0.8), - label=c("Addition of\nB.hydrogenotrophica","Acetate Discontinued"), - hjust=c(1.05,-0.05)) +# Create a time series plot for divergence +p <- plotColData( + tse, x = "Time.hr", y = "divergence", colour_by = "StudyIdentifier") + + # Add line between points + geom_line(aes(group = .data[["colour_by"]], colour = .data[["colour_by"]])) +p ``` ## Visualizing selected taxa -Now visualize abundance of *Blautia hydrogenotrophica* using the `miaViz::plotSeries` function. +Now visualize abundance of _Blautia hydrogenotrophica_ using the +`miaViz::plotSeries()` function. -```{r fig.height=4, fig.width=8} -library(miaViz) -plotSeries(transformAssay(minimalgut, method = "relabundance"), - x = "Time.hr", - y = "Blautia_hydrogenotrophica", - colour_by = "Species", - assay.type = "relabundance")+ - geom_vline(xintercept = 152, lty=2, color="#991720") + - geom_vline(xintercept = 248, lty=2, color= "#0963bd")+ - annotate("text",x=c(152, 248),y=c(0.2, 0.15), - label=c("Addition of\nB.hydrogenotrophica","Acetate Discontinued"), - hjust=c(1.05,-0.05))+ - labs(x="Time (h)", y="B.hydrogenotrophica\nRelative Abundance") + - theme(legend.position = "none") -``` +```{r} +#| label: plot_series +library(miaViz) +# Plot certain feature by time +p <- plotSeries( + tse, + x = "Time.hr", y = "Blautia_hydrogenotrophica", colour_by = "Species", + assay.type = "relabundance") +p +``` ## Visualize the rate (slope) of divergence Sample dissimilarity between consecutive time steps(step size n >= 1) within -a group(subject, age, reaction chamber, etc.) can be calculated by `addStepwiseDivergence`. If we normalize this by the time interval, this gives an approximate slope for the change. - - -```{r addStepwiseDivergence, fig.height=4, fig.width=8, warning=FALSE} -# Load libraries -library(miaTime) -library(dplyr) +a group(subject, age, reaction chamber, etc.) can be calculated by +`addStepwiseDivergence`. -# Sort samples by time (necessary for addStepwiseDivergence) -tse <- tse[, order(colData(tse)$Time_hr_num)] +```{r} +#| label: stepwise_divergence # Divergence between consecutive time points -tse <- addStepwiseDivergence(tse, group = "StudyIdentifier", - time_interval = 1, - time.col = "Time_hr_num", - name = "divergence_from_previous_step", - name.time = "time_from_previous_step", - assay.type = "relabundance", - method = "bray") - -# We have now new fields added in the colData: -# time_from_previous_step, divergence_from_previous_step -# print(colData(tse)) - -# Visualize the slope of dissimilarity between consecutive time points as a function of time (from baseline) -library(ggplot2) -theme_set(theme_bw(10)) -p <- tse |> ggplot(aes(x=time_from_baseline, - y=divergence_from_previous_step/time_from_previous_step, - color=StudyIdentifier)) + - geom_point() + - geom_line() + - labs(x="Time (hours)", y="Slope of dissimilarity (Bray-Curtis)") + - geom_hline(aes(yintercept=0), linetype=2) - -print(p) +tse <- addStepwiseDivergence( + tse, + assay.type = "relabundance", + method = "bray", + group = "StudyIdentifier", + time.interval = 1, + time.col = "Time.hr", + name = "divergence_from_previous_step", + name.time = "time_from_previous_step" + ) ``` -## Moving average of the slope +The results are again stored in `colData`. We calculate the speed of divergence +change by dividing each divergence change by the corresponding change in time. +Then we use similar plotting methods as previously. -This shows how to calculate and plot moving average for the variable of interest (here: slope). - -```{r rollmean, fig.height=4, fig.width=8, warning=FALSE} -# Add slope explicitly in colData -colData(tse)$slope <- colData(tse)$divergence_from_previous_step / colData(tse)$time_from_previous_step - -# Split by group and perform operation -tselist <- splitOn(tse, "StudyIdentifier") - -# colData(tse)$divergence_from_previous_step -addmean <- function (x, k, field, field_name) { - # Calculate rolling mean - m <- zoo::rollmean(x[[field]], k = k) - # Initialize empty field - colData(x)[[field_name]] <- rep(NA, ncol(x)) - # Fill in the rolling mean (length does not match with original data in rolling mean) - colData(x)[1:length(m), field_name] <- m - # Return the object with a new field added - x -} - -# Calculate sliding average for the field "divergence_from_previous_step" -# and store the result in a new field with the name "sliding_average" -tselist2 <- lapply(tselist, function (x) {addmean(x, k=3, field = "slope", field_name = "sliding_average")}) - -# Merge back -tse <- mergeSEs(tselist2) - -# Visualize -theme_set(theme_bw(10)) -p <- tse |> ggplot(aes(x = time_from_baseline, - y = sliding_average, - color=StudyIdentifier)) + - geom_point() + - geom_line() + - labs(x="Time (hours)", y="Mean slope of dissimilarity (Bray-Curtis)") + - geom_hline(aes(yintercept=0), linetype=2) -print(p) +```{r} +#| label: show_stepwise + +# Calculate slope for the change +tse[["divergence_change"]] <- tse[["divergence_from_previous_step"]] / + tse[["time_from_previous_step"]] + +# Create a time series plot for divergence +p <- plotColData( + tse, + x = "Time.hr", + y = "divergence_change", + colour_by = "StudyIdentifier" + ) + + # Add line between points + geom_line(aes(group = .data[["colour_by"]], colour = .data[["colour_by"]])) +p ``` +## Moving average of the slope +This shows how to calculate and plot moving average for the variable of +interest (here: slope). -## Error bars - -This shows how to visualize error bars on top of time series. In this -example we create artificial replicates and variation for a brief example. - - -```{r errorbars, fig.height=4, fig.width=8, warning=FALSE, message=FALSE} -## Source: https://www.geeksforgeeks.org/adding-error-bars-to-a-line-graph-with-ggplot2-in-r/ -## Gives count, mean, standard deviation, standard error of the mean, -## and confidence interval (default 95%). -## data: a data frame. -## measurevar: the name of a column that contains the variable to be summariezed -## groupvars: a vector containing names of columns that contain grouping variables -## na.rm: a boolean that indicates whether to ignore NA's -## conf.interval: the percent range of the confidence interval (default is 95%) -summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE, - conf.interval=.95, .drop=TRUE) { - library(plyr) - - # New version of length which can handle NA's: if na.rm==T, don't count them - length2 <- function (x, na.rm=FALSE) { - if (na.rm) sum(!is.na(x)) - else length(x) - } - - # This does the summary. For each group's data frame, return a vector with - # N, mean, and sd - datac <- ddply(data, groupvars, .drop=.drop, - .fun = function(xx, col) { - c(N = length2(xx[[col]], na.rm=na.rm), - mean = mean (xx[[col]], na.rm=na.rm), - sd = sd (xx[[col]], na.rm=na.rm) - ) - }, - measurevar - ) +```{r} +#| label: moving_average - # Rename the "mean" column - datac <- rename(datac, c("mean" = measurevar)) - - datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean - - # Confidence interval multiplier for standard error - # Calculate t-statistic for confidence interval: - # e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1 - ciMult <- qt(conf.interval/2 + .5, datac$N-1) - datac$ci <- datac$se * ciMult - - return(datac) -} - -# Create artificial example on replicates because we don't have any in this demo data -set.seed(3422) -df <- as.data.frame(colData(tse)) -sdlevel <- 0.1*mean(df$sliding_average, na.rm=TRUE) -df1 <- df -df2 <- df; df2$sliding_average <- rnorm(mean=df1$sliding_average, sd = sdlevel, n=nrow(df1)) -df3 <- df; df3$sliding_average <- rnorm(mean=df1$sliding_average, sd = sdlevel, n=nrow(df1)) -df <- bind_rows(list(df1, df2, df3)) - -# Calculate deviations -df <- summarySE(df, measurevar="sliding_average", groupvars=c("StudyIdentifier", "time_from_baseline")) - -# Visualize -theme_set(theme_bw(10)) -p <- ggplot(df, - aes(x = time_from_baseline, y = sliding_average, color=StudyIdentifier)) + - geom_point() + - geom_line() + - geom_errorbar(aes(ymin=sliding_average-sd, ymax=sliding_average+sd), width=.2, - position=position_dodge(0.05)) + - labs(x="Time (hours)", y="Mean slope of dissimilarity (Bray-Curtis)") + - geom_hline(aes(yintercept=0), linetype=2) -print(p) +library(dplyr) + +# Calculate moving average with time window of 3 time points +tse[["sliding_divergence"]] <- colData(tse) |> + as.data.frame() |> + # Group based on reactor + group_by(StudyIdentifier) |> + # Calculate moving average + mutate(sliding_avg = ( + # We get the previous 2 samples + lag(divergence_change, 2) + + lag(divergence_change, 1) + + # And the current sample + divergence_change + # And take average + ) / 3 + ) |> + # Get only the values as vector + ungroup() |> + pull(sliding_avg) ``` +After calculating the moving average of divergences, we can visualize the +result in a similar way to our previous approach. +```{r} +#| label: show_moving_average + +# Create a time series plot for divergence +p <- plotColData( + tse, + x = "Time.hr", + y = "sliding_divergence", + colour_by = "StudyIdentifier" + ) + + # Add line between points + geom_line(aes(group = .data[["colour_by"]], colour = .data[["colour_by"]])) +p +``` # Session info ```{r} +#| label: session_info + sessionInfo() ``` diff --git a/vignettes/miaTime.Rmd b/vignettes/miaTime.Rmd index 11e3ce5..9524e70 100755 --- a/vignettes/miaTime.Rmd +++ b/vignettes/miaTime.Rmd @@ -1,11 +1,6 @@ --- title: "miaTime: Microbiome Time Series Analysis" date: "`r Sys.Date()`" -author: -- name: Leo Lahti - email: leo.lahti@iki.fi -- name: Yagmur Simsek - email: yagmur.simsek@hsrw.org package: miaTime output: @@ -21,45 +16,102 @@ vignette: > %\VignetteEncoding{UTF-8} --- -```{r eval=FALSE, include=FALSE} -library(Cairo) -knitr::opts_chunk$set(cache = FALSE, - fig.width = 9, - dpi=300, - dev = "png", - dev.args = list(type = "cairo-png"), - message = FALSE, - warning = FALSE) +```{r} +#| label: setup +#| include: false + +library(knitr) +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + cache = TRUE +) ``` +## Introduction +`miaTime` is a package in the `r BiocStyle::Biocpkg("mia")` family, providing +tools for time series manipulation using the +`r BiocStyle::Biocpkg("TreeSummarizedExperiment")` data container. +## Installation -# Introduction +`miaTime` is hosted on Bioconductor, and can be installed using via +`BiocManager`. -`miaTime` implements tools for time series manipulation based on the `TreeSummarizedExperiment` data container. +```{r} +#| label: install +#| eval: false -## Installation +BiocManager::install("miaTime") +``` -Installing the latest development version in R. +## Load the package -```{r, eval=FALSE} -library(devtools) -devtools::install_github("microbiome/miaTime") -``` +Once installed, `miaTime` is made available in the usual way. -Loading the package: +```{r} +#| label: load_package -```{r load-packages, message=FALSE, warning=FALSE} library(miaTime) ``` +## Divergence between time points + +`miaTime` offers functions to calculate divergences. These can be calculated +based on samples and their corresponding base time point, e.g., first sample of +time series. Moreover, divergences can be calculated in rolling basis meaning +that a sample is compared to previous ith sample. + +Divergences can be calculated with `get*Divergence()` functions. In the example +below, for each subject, we calculate the divergence of their samples by +comparing them to the first time point. + +```{r} +#| label: base_divergence + +data(hitchip1006) +tse <- hitchip1006 + +res <- getBaselineDivergence( + tse, time.col = "time", group = "sample", name = "baseline") +res |> head() +``` + +A more convenient and preferred approach is to store the values directly in +`colData` using the `get*Divergence()` functions. In the example below, we +calculate stepwise divergences with a lag of 1, meaning that for each sample, +the divergence is calculated by comparing it to the previous time point for +the same subject. -See [articles](https://microbiome.github.io/miaTime/articles/) for more detailed example workflows. +```{r} +#| label: time_divergence +tse <- addStepwiseDivergence(tse, time.col = "time") +colData(tse) +``` + +## Visualize time series + +We can visualize time series data with `r BiocStyle::Biocpkg("miaViz")`. Below, +we visualize 2 most abundant taxa. + +```{r} +#| label: plot_series + +library(miaViz) + +p <- plotSeries(tse, x = "time", y = getTop(tse, 5)) +p +``` + +See [articles](https://microbiome.github.io/miaTime/articles/) for more detailed +example workflows. ## Session info ```{r} +#| label: session_info + sessionInfo() ```