From 6994a432465b8491b58959b2b96cf25a6d22a1f0 Mon Sep 17 00:00:00 2001
From: Dimitris Rizopoulos <d.rizopoulos@erasmusmc.nl>
Date: Tue, 23 Apr 2024 17:48:21 +0200
Subject: [PATCH] updates

---
 vignettes/Competing_Risks.Rmd           | 10 ++++-----
 vignettes/Dynamic_Predictions.Rmd       | 27 +++++++++--------------
 vignettes/JMbayes2.Rmd                  | 19 ++++++++--------
 vignettes/Multi_State_Processes.Rmd     | 29 +++++++++----------------
 vignettes/Non_Gaussian_Mixed_Models.Rmd | 20 ++++++++---------
 vignettes/Recurring_Events.Rmd          | 21 +++++++++---------
 vignettes/Super_Learning.Rmd            | 18 +++++++--------
 vignettes/Time_Varying_Effects.Rmd      |  8 +++----
 vignettes/Transformation_Functions.Rmd  |  8 +++----
 9 files changed, 70 insertions(+), 90 deletions(-)

diff --git a/vignettes/Competing_Risks.Rmd b/vignettes/Competing_Risks.Rmd
index dfd48f65..3ef15fe8 100644
--- a/vignettes/Competing_Risks.Rmd
+++ b/vignettes/Competing_Risks.Rmd
@@ -21,8 +21,8 @@ library("JMbayes2")
 
 # Competing Risks
 ## Prepare data
-The first step to fit a joint model for competing events in **JMbayes2** is to prepare the data for the event process. If there are $K$ competing events, then each subject needs to have $K$ rows, one for each possible cause. The observed event time $T_i$ of each subject is repeated $K$ times, and there are two indicator variables, namely one identifying the cause, and one indicating whether the corresponding event type is the one that occurred. Standard survival datasets that included a single row
-per patient, can be easily transformed to the competing risks long format using function `crisk_setup()`. This function accepts as main arguments the survival data in the standard format that has a single row per patient, the name of the status variable, and the level in this status variable that corresponds to censoring. We illustrate the use of this function in the PBC data, in which we treat as competing risks transplantation and death:
+The first step in fitting a joint model for competing events in **JMbayes2** is to prepare the data for the event process. If there are $K$ competing events, each subject must have $K$ rows, one for each possible cause. The observed event time $T_i$ of each subject is repeated $K$ times, and there are two indicator variables, namely one identifying the cause and one indicating whether the corresponding event type is the one that occurred. Standard survival datasets that include a single row
+per patient can be easily transformed to the competing risks long format using the function `crisk_setup()`. This function accepts as main arguments the survival data in the standard format with a single row per patient, the name of the status variable, and the level in this status variable that corresponds to censoring. We illustrate the use of this function in the PBC data, in which we treat as competing risks transplantation and death:
 ```{r, "prepare_data"}
 pbc2.id[pbc2.id$id %in% c(1, 2, 5), c("id", "years", "status")]
 
@@ -43,7 +43,7 @@ CoxFit_CR <- coxph(Surv(years, status2) ~ (age + drug) * strata(CR),
                      data = pbc2.idCR)
 ```
 
-For the longitudinal process, we include two longitudinal outcomes, namely serum bilirubin and the prothrombin time. For the former we use quadratic orthogonal polynomials in the fixed- and random-effects parts, and for the latter linear evolutions:
+We include two longitudinal outcomes for the longitudinal process, namely serum bilirubin and the prothrombin time. For the former, we use quadratic orthogonal polynomials in the fixed- and random-effects parts, and for the latter, linear evolutions:
 ```{r, "mixed models"}
 fm1 <- lme(log(serBilir) ~ poly(year, 2) * drug, data = pbc2, 
            random = ~ poly(year, 2) | id)
@@ -69,14 +69,14 @@ summary(jFit_CR)
 ```
 
 ## Dynamic predictions
-Based on the fitted competing risks joint model, we will illustrate how (dynamic) predictions for the cause-specific cumulative risk probabilities can be calculated. As an example, we will show these calculations for Patient 81 from the PBC dataset. First, we extract the data of this subject. 
+Based on the fitted competing risks joint model, we will illustrate how (dynamic) predictions can be calculated for the cause-specific cumulative risk probabilities. We will show these calculations for Patient 81 from the PBC dataset as an example. First, we extract the data on this subject. 
 ```{r, "data_P81"}
 ND_long <- pbc2[pbc2$id == 81, ]
 ND_event <- pbc2.idCR[pbc2.idCR$id == 81, ]
 ND_event$status2 <- 0
 ND <- list(newdataL = ND_long, newdataE = ND_event)
 ```
-The first line extracts the longitudinal measurements, and the second line extracts the event times per cause (i.e., death and transplantation). This particular patient died at 6.95 years, but to make the calculation of cause-specific cumulative risk more relevant, we presume that she did not have the event, and we set the event status variable `status2` to zero. The last line combines the two datasets in a list. *Note:* this last step is a prerequisite from the `predict()` method for competing risks joint model. That is, the datasets provided in the arguments `newdata` and `newdata2` need to be named lists with two components. The first component needs to be named `newdataL` and contain the dataset with the longitudinal measurements, and the second component needs to be named `newdataE` and contain the dataset with the event information.
+The first line extracts the longitudinal measurements, and the second line extracts the event times per cause (i.e., death and transplantation). This patient died at 6.95 years, but to make the calculation of cause-specific cumulative risk more relevant, we presume that she did not have the event, and we set the event status variable `status2` to zero. The last line combines the two datasets in a list. *Note:* this last step is a prerequisite from the `predict()` method for competing risks joint model. That is, the datasets provided in the arguments `newdata` and `newdata2` need to be named lists with two components. The first component needs to be named `newdataL` and contain the dataset with the longitudinal measurements. The second component needs to be named `newdataE` and contain the dataset with the event information.
 
 The predictions are calculated using the `predict()` method. The first call to this function calculates the prediction for the longitudinal outcomes at the times provided in the `times` argument, and the second call calculates the cause-specific cumulative risk probabilities. By setting the argument `return_newdata` to `TRUE` in both calls, we can use the corresponding `plot()` method to depict the predictions:
 ```{r, "CIFs"}
diff --git a/vignettes/Dynamic_Predictions.Rmd b/vignettes/Dynamic_Predictions.Rmd
index 68bd9507..9be29b2b 100644
--- a/vignettes/Dynamic_Predictions.Rmd
+++ b/vignettes/Dynamic_Predictions.Rmd
@@ -27,10 +27,8 @@ $$\begin{array}{l}
 \pi_j(u \mid t) = \mbox{Pr}\{T_j^* \leq u \mid T_j^* > t, \mathcal Y_j(t), \mathcal D_n\}\\\\
 = \displaystyle 1 - \int\int \frac{S(u \mid b_j, \theta)}{S(t \mid b_j, \theta)} \; p\{b_j \mid T_j^* > t, \mathcal Y_j(t), \theta\} \; p(\theta \mid \mathcal D_n) \; db_j d\theta,
 \end{array}$$
-where $S(\cdot)$ denotes the survival function conditional on the random effects, and 
-$\mathcal Y_j(t) = \{\mathcal Y_{1j}(t), \ldots, \mathcal Y_{Kj}(t)\}$. Combining the 
-three terms in the integrand we can device a Monte Carlo scheme to obtain estimates of 
-these probabilities, namely,
+where $S(\cdot)$ denotes the survival function conditional on the random effects and $\mathcal Y_j(t) = \{\mathcal Y_{1j}(t), \ldots, \mathcal Y_{Kj}(t)\}$. Combining the 
+three terms in the integrand, we can devise a Monte Carlo scheme to obtain estimates of these probabilities, namely,
 
 1. Sample a value $\tilde \theta$ from the posterior of the parameters 
 $[\theta \mid \mathcal D_n]$.
@@ -50,9 +48,7 @@ and their standard error by calculating the standard deviation across the Monte
 samples.
 
 ## Example
-We will illustrate the calculation of dynamic predictions using package **JMbayes2** from a
-trivariate joint model fitted to the PBC dataset for the longitudinal outcomes `serBilir` (continuous),
-`prothrombin` time (continuous) and `ascites` (dichotomous). We start by fitting the univariate mixed models. For the two continuous outcomes, we allow for nonlinear subject-specific time effects using natural cubic splines. For `ascites`, we postulate linear subject-specific profiles for the log odds. The code is:
+We will illustrate the calculation of dynamic predictions using package **JMbayes2** from a trivariate joint model fitted to the PBC dataset for the longitudinal outcomes `serBilir` (continuous), `prothrombin` time (continuous), and `ascites` (dichotomous). We start by fitting the univariate mixed models. For the two continuous outcomes, we allow for nonlinear subject-specific time effects using natural cubic splines. For `ascites`, we postulate linear subject-specific profiles for the log odds. The code is:
 ```{r}
 fm1 <- lme(log(serBilir) ~ ns(year, 3) * sex, data = pbc2,
            random = ~ ns(year, 3) | id, control = lmeControl(opt = 'optim'))
@@ -84,19 +80,19 @@ ND$status2 <- 0
 ND$years <- t0
 ```
 
-We will only use the first five years of follow-up (line three), and further we specify that the patients were event-free up to this time point (lines four and five).
+We will only use the first five years of follow-up (line three) and specify that the patients were event-free up to this point (lines four and five).
 
-We start with predictions for the longitudinal outcomes. These are produced by the `predict()` method for class `jm` objects, and follow the same lines as the procedure described above for cumulative risk probabilities. The only difference is in Step 3, where instead of calculating the cumulative risk we calculate the predicted values for the longitudinal outcomes. There are two options controlled by the `type_pred` argument, namely predictions at the scale of the response/outcome (default) or at the linear predictor level. The `type` argument controls if the predictions will be for the mean subject (i.e., including only the fixed effects) or subject-specific including both the fixed and random effects. In the `newdata` argument we provide the available measurements of the two patients. This will be used to sample their random effects at Step 2 presented above. This is done with a Metropolis-Hastings algorithm that runs for `n_mcmc` iterations; all iterations but the last one are discarded as burn-in. Finally, argument `n_samples` corresponds to the value of $L$ defined above and specifies the number of Monte Carlo samples: 
+We start with predictions for the longitudinal outcomes. These are produced by the `predict()` method for class `jm` objects and follow the same lines as the procedure described above for cumulative risk probabilities. The only difference is in Step 3, where instead of calculating the cumulative risk, we calculate the predicted values for the longitudinal outcomes. There are two options controlled by the `type_pred` argument, namely predictions at the scale of the response/outcome (default) or at the linear predictor level. The `type` argument controls whether the predictions will be for the mean subject (i.e., including only the fixed effects) or subject-specific, including both the fixed and random effects. In the `newdata` argument we provide the available measurements of the two patients. This will be used to sample their random effects in Step 2, presented above. This is done with a Metropolis-Hastings algorithm that runs for `n_mcmc` iterations; all iterations but the last one are discarded as burn-in. Finally, argument `n_samples` corresponds to the value of $L$ defined above and specifies the number of Monte Carlo samples: 
 ```{r}
 predLong1 <- predict(jointFit, newdata = ND, return_newdata = TRUE)
 ```
 
-Argument `return_newdata` specifies that the predictions are returned as extra columns of the `newdata` data.frame. By default the 95\% credible intervals are also included. Using the `plot()` method for objects returned by `predict.jm(..., return_newdata = TRUE)`, we can display the predictions. With the following code we do that for the first longitudinal outcome:
+Argument `return_newdata` specifies that the predictions are returned as extra columns of the `newdata` data.frame. By default, the 95\% credible intervals are also included. Using the `plot()` method for objects returned by `predict.jm(..., return_newdata = TRUE)`, we can display the predictions. With the following code, we do that for the first longitudinal outcome:
 ```{r, fig.align = "center", fig.width = 8.5, fig.height = 7.5}
 plot(predLong1)
 ```
 
-When we want to calculate predictions for other, future time points, we can accordingly specify the `times` argument. In the following example, we calculate predictions from time `t0` to time 12:
+When we want to calculate predictions for other future time points, we can accordingly specify the `times` argument. In the following example, we calculate predictions from time `t0` to time 12:
 ```{r}
 predLong2 <- predict(jointFit, newdata = ND,
                      times = seq(t0, 12, length.out = 51),
@@ -120,7 +116,7 @@ The predictions are included again as extra columns in the corresponding data.fr
 plot(predLong2, predSurv)
 ```
 
-Again by default, the plot is for the predictions of the first subject (i.e., Patient 25) and for the first longitudinal outcome (i.e., `log(serBilir)`). However, the `plot()` method has a series of arguments that allows users to customize the plot. We illustrate some of these capabilities with the following figure. First, we specify that we want to depict all three outcomes using `outcomes = 1:3` (note: a max of three outcomes can be simultaneously displayed). Next, we specify via the `subject` argument that we want to show the predictions of Patient 93. Note, that for serum bilirubin we used the log transformation in the specification of the linear mixed model. Hence, we receive predictions on the transformed scale. To show predictions on the original scale, we use the `fun_long` argument. Because we have three outcomes, this needs to be a list of three functions. The first one, corresponding to serum bilirubin is the `exp()` and for the other two the `identity()` because we do not wish to transform the predictions. Analogously, we also have the `fun_event` argument to transform the predictions for the event outcome, and in the example below we set that we want to obtain survival probabilities. Using the arguments `bg`, `col_points`, `col_line_long`, `col_line_event`, `fill_CI_long`, and `fill_CI_event` we have changed the appearance of the plot to a dark theme. Finally, the `pos_ylab_long` specifies the relative positive of the y-axis labels for the three longitudinal outcomes.
+Again, by default, the plot is for the predictions of the first subject (i.e., Patient 25) and the first longitudinal outcome (i.e., `log(serBilir)`). However, the `plot()` method has a series of arguments that allow users to customize the plot. We illustrate some of these capabilities with the following figure. First, we specify that we want to depict all three outcomes using `outcomes = 1:3` (note: a max of three outcomes can be simultaneously displayed). Next, we specify via the `subject` argument that we want to show the predictions of Patient 93. Note that for serum bilirubin, we used the log transformation in the specification of the linear mixed model. Hence, we receive predictions on the transformed scale. To show predictions on the original scale, we use the `fun_long` argument. Because we have three outcomes, this needs to be a list of three functions. The first one, corresponding to serum bilirubin, is the `exp()`, and for the other two the `identity()` because we do not wish to transform the predictions. Analogously, we also have the `fun_event` argument to transform the predictions for the event outcome, and in the example below, we set the goal of obtaining survival probabilities. Using the arguments `bg`, `col_points`, `col_line_long`, `col_line_event`, `fill_CI_long`, and `fill_CI_event`, we have changed the appearance of the plot to a dark theme. Finally, the `pos_ylab_long` specifies the relative positive of the y-axis labels for the three longitudinal outcomes.
 ```{r, eval = FALSE}
 cols <- c('#F25C78', '#D973B5', '#F28322')
 plot(predLong2, predSurv, outcomes = 1:3, subject = 93,
@@ -169,7 +165,7 @@ tvAUC(roc)
 
 This function either accepts an object of class `tvROC` or of class `jm`. In the latter case, the user must also provide the `newdata`, `Tstart` and `Dt` or `Thoriz` arguments. Here we have used the same dataset as the one to fit the model, but, in principle, discrimination could be (better) assessed in another dataset.
 
-To assess the accuracy of the predictions we produce a calibration plot:
+To assess the accuracy of the predictions, we produce a calibration plot:
 ```{r, fig.align = "center", fig.width = 8.5, fig.height = 7.5}
 calibration_plot(jointFit, newdata = pbc2, Tstart = t0, Dt = 3)
 ```
@@ -197,8 +193,5 @@ tvBrier(jointFit, newdata = pbc2, Tstart = t0, Dt = 3, integrated = TRUE,
 
 **Notes:**
 
-- To obtain valid estimates of the predictive accuracy measures (i.e., time-varying sensitivity, specificity, and Brier score) we need to account for censoring. A popular method to achieve this is via inverse probability of censoring weighting. For this approach to be valid, we need the model for the weights to be correctly specified. In standard survival analysis, this is achieved either using the Kaplan-Meier estimator or a Cox model for the censoring distribution. However, in the settings where joint models are used, it is often the case that the censoring mechanism may depend on the history of the longitudinal outcomes in a complex manner. This is especially the case when we consider multiple longitudinal outcomes in the analysis. Also, these outcomes may be recorded at different time points per patient and have missing data. Because of these reasons, in these settings, Kaplan-Meier-based or Cox-based censoring weights may be difficult to derive or be biased. The functions in **JMbayes2** that calculate the predictive accuracy measures use joint-model-based weights to account for censoring. These weights allow censoring to depend in any possible manner on the history of the longitudinal outcomes. However, they require that the model is appropriately calibrated.
+- To obtain valid estimates of the predictive accuracy measures (i.e., time-varying sensitivity, specificity, and Brier score) we need to account for censoring. A popular method to achieve this is via the inverse probability of censoring weighting. For this approach to be valid, we need the model for the weights to be correctly specified. In standard survival analysis, this is achieved either using the Kaplan-Meier estimator or a Cox model for the censoring distribution. However, in the settings where joint models are used, it is often the case that the censoring mechanism may depend on the history of the longitudinal outcomes in a complex manner. This is especially the case when we consider multiple longitudinal outcomes in the analysis. Also, these outcomes may be recorded at different time points per patient and have missing data. Because of these reasons, in these settings, Kaplan-Meier-based or Cox-based censoring weights may be difficult to derive or be biased. The functions in **JMbayes2** that calculate the predictive accuracy measures use joint-model-based weights to account for censoring. These weights allow censoring to depend in any possible manner on the history of the longitudinal outcomes. However, they require that the model is appropriately calibrated.
 - The calibration curve, produced by `calibration_plot()`, and the calibration metrics, produced by `calibration_metrics())`, are calculated using the procedure described in [Austin et al., 2020](https://doi.org/10.1002/sim.8570).
-
-
-
diff --git a/vignettes/JMbayes2.Rmd b/vignettes/JMbayes2.Rmd
index 0d0ead87..5b3525e4 100644
--- a/vignettes/JMbayes2.Rmd
+++ b/vignettes/JMbayes2.Rmd
@@ -21,14 +21,13 @@ library("JMbayes2")
 
 # Fitting Joint Models with JMbayes2
 ## Univariate
-The function that fits joint models in **JMbayes2** is called `jm()`. It has three required arguments, `Surv_object` a Cox model fitted by `coxph()` or an Accelerated Failure time model fitted by `survreg()`, 
-`Mixed_objects` a single or a list of mixed models fitted either by the `lme()` or `mixed_model()` functions, and `time_var` a character string indicating the name of the time variable in the specification of the mixed-effects models. We will illustrate the basic use of the package in the PBC dataset. We start by fitting a Cox model for the composite event transplantation or death, including sex as a baseline covariate:
+The function that fits joint models in **JMbayes2** is called `jm()`. It has three required arguments, `Surv_object` a Cox model fitted by `coxph()` or an Accelerated Failure time model fitted by `survreg()`, `Mixed_objects` a single or a list of mixed models fitted either by the `lme()` or `mixed_model()` functions, and `time_var` a character string indicating the name of the time variable in the specification of the mixed-effects models. We will illustrate the basic use of the package in the PBC dataset. We start by fitting a Cox model for the composite event transplantation or death, including sex as a baseline covariate:
 ```{r}
 pbc2.id$status2 <- as.numeric(pbc2.id$status != 'alive')
 CoxFit <- coxph(Surv(years, status2) ~ sex, data = pbc2.id)
 ```
 
-Our aim is to assess the strength of the association between the risk of the composite event and the levels of serum bilirubin that has been collected during follow-up. We will describe the patient-specific profiles over time for this biomarker using a linear mixed model, with fixed-effects, time, sex, and their interaction, and as random effects random intercepts and random slopes. The syntax to fit this model with `lme()` is:
+We aim to assess the strength of the association between the risk of the composite event and the serum bilirubin levels collected during follow-up. We will describe the patient-specific profiles over time for this biomarker using a linear mixed model, with fixed effects, time, sex, and their interaction, and as random effects, random intercepts, and random slopes. The syntax to fit this model with `lme()` is:
 ```{r}
 fm1 <- lme(log(serBilir) ~ year * sex, data = pbc2, random = ~ year | id)
 ```
@@ -41,7 +40,7 @@ summary(jointFit1)
 
 The output of the `summary()` method provides some descriptive statistics of the sample at hand, followed by some fit statistics based on the marginal (random effects are integrated out using the Laplace approximation) and conditional on the random effects log-likelihood functions, followed by the estimated variance-covariance matrix for the random effects, followed by the estimates for the survival submodel, followed by the estimates for the longitudinal submodel(s), and finally some information for the MCMC fitting algorithm.
 
-By default, `jm()` adds the subject-specific linear predictor of the mixed model as a time-varying covariate in the survival relative risk model. In the output this is named as `value(log(serBilir))` to denote that, by default, the current value functional form is used. That is, we assume that the instantaneous risk of an event at a specific time $t$ is associated with the value of the linear predictor of the longitudinal outcome at the same time point $t$.
+By default, `jm()` adds the subject-specific linear predictor of the mixed model as a time-varying covariate in the survival relative risk model. In the output, this is named as `value(log(serBilir))` to denote that, by default, the current value functional form is used. That is, we assume that the instantaneous risk of an event at a specific time $t$ is associated with the value of the linear predictor of the longitudinal outcome at the same time point $t$.
 
 Standard MCMC diagnostics are available to evaluate convergence. For example, the traceplot for the association coefficient `value(log(serBilir))` is produced with the following syntax:
 ```{r, fig.align='center'}
@@ -54,7 +53,7 @@ ggdensityplot(jointFit1, "alphas")
 ```
 
 ## Multivariate
-To fit a joint model with multiple longitudinal outcomes, we simply provide a list of mixed models as the second argument of `jm()`. In the following example, we extend the joint model we fitted above by also including the prothrombin time and the log odds of the presence or not of ascites as time-varying covariates in the relative risk model for the composite event. Ascites is a dichotomous outcome and therefore we fit a mixed-effects logistic regression model for it using the `mixed_model()` function from the **GLMMadaptive** package. The use of `||` in the `random` argument of `mixed_model()` specifies that the random intercepts and random slopes are assumed uncorrelated. In addition, the argument `which_independent` can be used to specify which longitudinal outcomes are to be assumed independent; here, as an illustration, we specify that the first (i.e., serum bilirubin) and second (i.e., prothrombin time) longitudinal outcomes are independent. To assume that all longitudinal outcomes are independent, we can use `jm(..., which_independent = "all")`. Because this joint model is more complex, we increase the number of MCMC iterations, the number of burn-in iterations and the thinning per chain, using the corresponding control arguments: 
+To fit a joint model with multiple longitudinal outcomes, we simply provide a list of mixed models as the second argument of `jm()`. In the following example, we extend the joint model we fitted above by also including the prothrombin time and the log odds of the presence or absence of ascites as time-varying covariates in the relative risk model for the composite event. Ascites is a dichotomous outcome, and therefore, we fit a mixed-effects logistic regression model for it using the `mixed_model()` function from the **GLMMadaptive** package. The use of `||` in the `random` argument of `mixed_model()` specifies that the random intercepts and random slopes are assumed uncorrelated. In addition, the argument `which_independent` can be used to specify which longitudinal outcomes are to be assumed independent; here, as an illustration, we specify that the first (i.e., serum bilirubin) and second (i.e., prothrombin time) longitudinal outcomes are independent. To assume that all longitudinal outcomes are independent, we can use `jm(..., which_independent = "all")`. Because this joint model is more complex, we increase the number of MCMC iterations, the number of burn-in iterations, and the thinning per chain using the corresponding control arguments: 
 ```{r}
 fm2 <- lme(prothrombin ~ year * sex, data = pbc2, random = ~ year | id)
 fm3 <- mixed_model(ascites ~ year + sex, data = pbc2,
@@ -66,12 +65,12 @@ jointFit2 <- jm(CoxFit, list(fm1, fm2, fm3), time_var = "year",
 summary(jointFit2)
 ```
 
-The output for the survival submodel contains now the estimated coefficients for `value(prothrombin)` and `value(ascites)`, and parameter estimates for all three longitudinal submodels.
+The survival submodel output now contains the estimated coefficients for `value(prothrombin)` and `value(ascites)`, as well as parameter estimates for all three longitudinal submodels.
 
 ## Functional forms
-As mentioned above, the default call to `jm()` includes the subject-specific linear predictors of the mixed-effects models as time-varying covariates in the relative risk model. However, this is just one of the many possibilities we have to link the longitudinal and survival outcomes. The argument `functional_forms` of `jm()` provides additional options. Based on previous experience, two extra functional forms are provided, namely, the time-varying slope and the time-varying *normalized* area/cumulative-effect. The time-varying slope is the first order derivative of the subject-specific linear predictor of the mixed-effect model with respect to the (follow-up) time variable. The time-varying *normalized* area/cumulative-effect is the integral of the subject-specific linear predictor of the mixed-effect model from zero to the current (follow-up) time $t$ divided by $t$. The integral is the area under the subject-specific longitudinal profile; by dividing the integral by $t$ we obtain the average of the subject-specific longitudinal profile over the corresponding period $(0, t)$.
+As mentioned above, the default call to `jm()` includes the subject-specific linear predictors of the mixed-effects models as time-varying covariates in the relative risk model. However, this is just one of the many possibilities for linking longitudinal and survival outcomes. The argument `functional_forms` of `jm()` provides additional options. Based on previous experience, two extra functional forms are provided: the time-varying slope and the time-varying *normalized* area/cumulative effect. The time-varying slope is the first-order derivative of the subject-specific linear predictor of the mixed-effect model with respect to the (follow-up) time variable. The time-varying *normalized* area/cumulative effect is the integral of the subject-specific linear predictor of the mixed-effect model from zero to the current (follow-up) time $t$ divided by $t$. The integral is the area under the subject-specific longitudinal profile; by dividing the integral by $t$, we obtain the average of the subject-specific longitudinal profile over the corresponding period $(0, t)$.
 
-To illustrate how the `functional_forms` argument can be used to specify these functional forms, we update the joint model `jointFit2` by including the time-varying slope of log serum bilirubin instead of the value, and also the interaction of this slope with sex, and for prothrombin we include the normalized cumulative effect. For ascites, we keep the value functional form. The corresponding syntax to fit this model is:
+To illustrate how the `functional_forms` argument can be used to specify these functional forms, we update the joint model `jointFit2` by including the time-varying slope of log serum bilirubin instead of the value and also the interaction of this slope with sex and for prothrombin we include the normalized cumulative effect. For ascites, we keep the current value functional form. The corresponding syntax to fit this model is:
 ```{r}
 fForms <- list(
   "log(serBilir)" = ~ slope(log(serBilir)) + slope(log(serBilir)):sex,
@@ -82,10 +81,10 @@ jointFit3 <- update(jointFit2, functional_forms = fForms)
 summary(jointFit3)
 ```
 
-As seen above, the `functional_forms` argument is a named list with elements corresponding to the longitudinal outcomes. If a longitudinal outcome is not specified in this list, then the default, value functional form, is used for that outcome. Each element of the list should be a one-sided R formula in which the functions `value()`, `slope()` and `area()` can be used. Interaction terms between the functional forms and other (baseline) covariates are also allowed.
+As seen above, the `functional_forms` argument is a named list with elements corresponding to the longitudinal outcomes. If a longitudinal outcome is not specified in this list, then the default value functional form is used for that outcome. Each element of the list should be a one-sided R formula in which the functions `value()`, `slope()`, and `area()` can be used. Interaction terms between the functional forms and other (baseline) covariates are also allowed.
 
 ## Penalized Coefficients using Shrinkage Priors
-When multiple longitudinal outcomes are considered with possibly different functional forms per outcome, we require to fit a relative risk model containing several terms. Moreover, it is often of scientific interest to select which terms/functional-forms per longitudinal outcome are more strongly associated with the risk of the event of interest. To facilitate this selection, `jm()` provides the option to penalize the regression coefficients using shrinkage priors. As an example, we refit `jointFit3` by assuming a Horseshoe prior for the `alphas` coefficients (i.e., the coefficients of the longitudinal outcomes in the relative risk model):
+When multiple longitudinal outcomes are considered with possibly different functional forms per outcome, we require to fit a relative risk model containing several terms. Moreover, it is often of scientific interest to select which terms/functional forms per longitudinal outcome are more strongly associated with the risk of the event of interest. To facilitate this selection, `jm()` allows penalizing the regression coefficients using shrinkage priors. As an example, we refit `jointFit3` by assuming a Horseshoe prior for the `alphas` coefficients (i.e., the coefficients of the longitudinal outcomes in the relative risk model):
 ```{r}
 jointFit4 <- update(jointFit3, priors = list("penalty_alphas" = "horseshoe"))
 cbind("un-penalized" = unlist(coef(jointFit3)), 
diff --git a/vignettes/Multi_State_Processes.Rmd b/vignettes/Multi_State_Processes.Rmd
index 3f4023ac..addfa786 100644
--- a/vignettes/Multi_State_Processes.Rmd
+++ b/vignettes/Multi_State_Processes.Rmd
@@ -21,13 +21,12 @@ library("JMbayes2")
 
 # Multi-state Processes
 ## Introduction
-It is often the case that a subject may transition between multiple states and we are interested to assess the association of longitudinal marker(s) with each of these transitions. In this vignette we will illustrate how to achieve this using **JMbayes2**. 
+A subject may often transition between multiple states, and we are interested in assessing the association of longitudinal marker(s) with each of these transitions. In this vignette, we will illustrate how to achieve this using **JMbayes2**. 
 
 We will consider a simple case with one longitudinal outcome and a three-state (illness-death) model, but this application can be extended for the cases of multiple longitudinal markers and more than three states. 
 
 ## Data
-First we will simulate data from a joint model with a single linear mixed effects model and a multi-state process with three possible states. The multi-state process can be visualized as:
-
+First, we will simulate data from a joint model with a single linear mixed effects model and a multi-state process with three possible states. The multi-state process can be visualized as:
 ```{r ms_figure, echo = FALSE, warning = FALSE, message = FALSE}
 library("ggplot2")
 
@@ -61,10 +60,9 @@ ggplot() + geom_rect(data = d, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax =
   theme_void()
 ```
 
-where all subjects start from state "Healthy" and then can transition to either state "Illness" and then state "Death" or directly to state "Death". In this case, states "Healthy" and "Illness" are *transient* states as the subject, when occupying these states, can still transition to other states whereas "Death" is an absorbing state as when a subject reaches this state then no further transitions can occur. This means that three transitions are possible: $1 \rightarrow 2$, $1 \rightarrow 3$ and $2 \rightarrow 3$ with transition intensities $h_{12}\left(t\right)$, $h_{13}\left(t\right)$ and $h_{23}\left(t\right)$ respectively. 
-
-For our example the default functional form is assumed, i.e., that the linear predictor $\eta(t)$ of the mixed model is associated with the each transition intensity at time $t$. The following piece of code simulates the data:
+where all subjects start from the state "Healthy" and then can transition to either state "Illness" and then state "Death" or directly to state "Death." In this case, states "Healthy" and "Illness" are *transient* states as the subject, when occupying these states, can still transition to other states, whereas "Death" is an absorbing state as when a subject reaches this state then no further transitions can occur. This means that three transitions are possible: $1 \rightarrow 2$, $1 \rightarrow 3$ and $2 \rightarrow 3$ with transition intensities $h_{12}\left(t\right)$, $h_{13}\left(t\right)$ and $h_{23}\left(t\right)$ respectively. 
 
+For our example, the default functional form is assumed, i.e., that the linear predictor $\eta(t)$ of the mixed model is associated with each transition intensity at time $t$. The following piece of code simulates the data:
 ```{r "simulate_Multi_State"}
 set.seed(1234)
 # number of subjects
@@ -232,30 +230,26 @@ DF <- DF[DF$time <= Tstop, ]
 ```
 
 The data for the multi-state process need to be in the appropriate long format:
-
 ```{r}
 head(data_mstate, n = 5L)
 ```
 
-for example subject 1 experienced the following transition: $1 \rightarrow 2$ and therefore is represented in 3 rows, one for each transition, because all of these transitions were plausible. On the other hand subject 2 is only represented by two rows, only for transitions $1 \rightarrow 2$ and $1 \rightarrow 3$ since these are the only transitions that are possible from state 1. That is, since subject 2 never actually transitioned to state 2, transition $2 \rightarrow 3$ was never possible and therefore no row for this transition is in the dataset. It is also important to note that the time in the dataset follows the counting process formulation with intervals specified by `Tstart` and `Tstop` and that there is a variable (in this case `transition`) which indicates to which transition the row corresponds to.
+for example, subject 1 experienced the following transition: $1 \rightarrow 2$ and therefore is represented in 3 rows, one for each transition, because all of these transitions were plausible. On the other hand, subject 2 is only represented by two rows, only for transitions $1 \rightarrow 2$ and $1 \rightarrow 3$ since these are the only transitions possible from state 1. That is since subject 2 never actually transitioned to state 2, transition $2 \rightarrow 3$ was never possible, and therefore, no row for this transition is in the dataset. It is also important to note that the time in the dataset follows the counting process formulation with intervals specified by `Tstart` and `Tstop` and that there is a variable (in this case `transition`) indicating which transition the row corresponds to.
 
 ## Fitting the model
 
-As soon data in the appropriate format are available, fitting the model is very straightforward. First we fit a linear mixed model using the `lme()` function from package **nlme**:
-
+When the data in the appropriate format are available, fitting the model is very straightforward. First we fit a linear mixed model using the `lme()` function from package **nlme**:
 ```{r, "mixed_model_fit"}
-mixedmodel <- lme(y ~ time + X, random = ~ time | id, data = DF)
+mixedmodel <- lme(y ~ time + X, data = DF, random = ~ time | id)
 ```
 
-then we fit a multi-state model using function `coxph()` from package **survival** making sure we use the counting process specification and that we add `strata(transition)` to stratify by the transition indicator variable in the dataset. Furthermore we add an interaction between covariate `X` and each transition to allow the effect of this covariate to vary across transitions.
-
+Then, we fit a multi-state model using function `coxph()` from package **survival** making sure we use the counting process specification and that we add `strata(transition)` to stratify by the transition indicator variable in the dataset. Furthermore, we add an interaction between covariate `X` and each transition to allow the effect of this covariate to vary across transitions.
 ```{r, "multi_state_model_fit"}
 msmodel <- coxph(Surv(Tstart, Tstop, status) ~ X * strata(transition), 
                  data = data_mstate)
 ```
 
-Finally, to fit the joint model we simply run:
-
+Finally, to fit the joint model, we simply run:
 ```{r, "jm_fit"}
 jm_ms_model <- jm(msmodel, mixedmodel, time_var = "time", 
                   functional_forms = ~ value(y):transition, n_iter = 10000L)
@@ -263,7 +257,4 @@ jm_ms_model <- jm(msmodel, mixedmodel, time_var = "time",
 summary(jm_ms_model)
 ```
 
-which differs from a default call to `jm()` by the addition of the `functional_forms` argument by which we specify that we want an "interaction" between the value of the marker and each transition which translates into a separate association parameter for the longitudinal marker and each transition.
-
-
-
+which differs from a default call to `jm()` by the addition of the `functional_forms` argument specifying that we want an "interaction" between the marker's value and each transition, which translates into a separate association parameter for the longitudinal marker and each transition.
diff --git a/vignettes/Non_Gaussian_Mixed_Models.Rmd b/vignettes/Non_Gaussian_Mixed_Models.Rmd
index d7712669..de82ea8d 100644
--- a/vignettes/Non_Gaussian_Mixed_Models.Rmd
+++ b/vignettes/Non_Gaussian_Mixed_Models.Rmd
@@ -24,9 +24,9 @@ Taking advantage of the versatility of the [**GLMMadaptive**](https://drizopoulo
 
 
 ## Beta mixed models
-With very few exceptions, continuous outcomes that we wish to analyze have some natural bounds. For example, the levels of blood biomarker for a set of patients. However, often, the majority of the observations are located far away from these natural bounds, and an assumption of a normal distribution for the outcome can be safely done. In some settings though, we can have outcomes for which a substantial percentage of the observations are located near the boundaries leading to skewed or U-shaped distributions. For such longitudinal outcomes, a linear mixed model with normal error terms often does not provide a good fit. A natural alternative is to select a distribution that respects the bounded nature of the outcome. The most well-known distribution for such outcomes is the Beta distribution defined in the $(0, 1)$ interval (*note:* a bounded outcome $Y^*$ in the $(a, b)$ interval can be transformed to the $Y = (Y^* - a) / (b - a)$ in the $(0, 1)$ interval). 
+With very few exceptions, continuous outcomes that we wish to analyze have some natural bounds. For example, the levels of blood biomarkers for a set of patients. However, most observations are often located far away from these natural bounds, and an assumption of a normal distribution for the outcome can be safely made. In some settings, though, we can have outcomes for which a substantial percentage of the observations are located near the boundaries leading to skewed or U-shaped distributions. A linear mixed model with normal error terms often does not provide a good fit for such longitudinal outcomes. A natural alternative is to select a distribution that respects the bounded nature of the outcome. The most well-known distribution for such outcomes is the Beta distribution defined in the $(0, 1)$ interval (*note:* a bounded outcome $Y^*$ in the $(a, b)$ interval can be transformed to the $Y = (Y^* - a) / (b - a)$ in the $(0, 1)$ interval). 
 
-The following piece of code illustrates how to simulate data from a joint model with a Beta mixed effects model. The default functional form is assumed, i.e., that the linear predictor $\eta(t)$ of the mixed model is associated with the hazard of an event at time $t$. The linear predictor is related to the mean $\mu(t)$ of the Beta distribution under the logit link function, i.e., $\log[\mu(t) / \{1 - \mu(t)\}] = \eta(t)$.
+The following code illustrates how to simulate data from a joint model with a Beta mixed effects model. The default functional form is assumed, i.e., that the linear predictor $\eta(t)$ of the mixed model is associated with the hazard of an event at time $t$. The linear predictor is related to the mean $\mu(t)$ of the Beta distribution under the logit link function, i.e., $\log[\mu(t) / \{1 - \mu(t)\}] = \eta(t)$.
 ```{r, "simulate_Beta"}
 set.seed(1234)
 n <- 200 # number of subjects
@@ -123,9 +123,9 @@ summary(jointFit)
 
 
 ## Censored linear mixed models
-Some continuous longitudinal outcomes may have a censored nature. A typical example of such outcomes is when we have limit of detection issues. That is, the values of the outcome cannot be detected below a specified threshold having to do with the (laboratory) equipment used to determine the measurements. In these settings and even if the complete data follow a normal distribution the observed censored data cannot be analyzed with a standard mixed model. The `mixed_model()` function can accommodate such outcomes using the `censored.normal()` family object.
+Some continuous longitudinal outcomes may have a censored nature. A typical example of such outcomes is when we have limit of detection issues. That is, the values of the outcome cannot be detected below a specified threshold having to do with the (laboratory) equipment used to determine the measurements. In these settings, and even if the complete data follows a normal distribution the observed censored data cannot be analyzed with a standard mixed model. The `mixed_model()` function can accommodate such outcomes using the `censored.normal()` family object.
 
-The following piece of code simulates data from a joint model with a linear mixed model for the longitudinal outcomes, but applies censoring in the realized longitudinal observations.
+The following piece of code simulates data from a joint model with a linear mixed model for the longitudinal outcomes but applies censoring in the realized longitudinal observations.
 ```{r, "simulate_CensNorm"}
 set.seed(1234)
 n <- 200 # number of subjects
@@ -220,7 +220,7 @@ summary(jointFit)
 
 
 ## Students's-t mixed models
-Outlying observations is a common occurring issue in practice. Several methods have been proposed in the literature for identifying such observations in the context of longitudinal data. However, removing such values from the analysis is in general not recommended unless we also have external information why these values are outlying. Hence, in these settings we would need to fit mixed models to accommodate these observations. A well-known approach to achieve this is to replace the normal distribution for the error terms in the linear mixed model with a Student's-t distribution that has heavier tails.
+Outlying observations are a common issue in practice. Several methods have been proposed in the literature for identifying such observations in the context of longitudinal data. However, removing such values from the analysis is generally not recommended unless we also have external information as to why these values are outlying. Hence, we would need to fit mixed models to accommodate these observations in these settings. A well-known approach to achieve this is replacing the normal distribution for the error terms in the linear mixed model with a Student's-t distribution with heavier tails.
 
 The following syntax simulates data from a joint model with a Student's-t mixed effects model:
 ```{r, "simulate_Std-t"}
@@ -316,9 +316,9 @@ summary(jointFit)
 
 
 ## Negative binomial mixed models
-Count longitudinal outcomes are typically modeled with the Poisson distribution. However, often these outcomes exhibit more variance than what is allowed from the Poisson distribution leading to the well-known problem of over-dispersion. To accommodate this over-dispersion typically the Negative Binomial distribution is used. 
+Count longitudinal outcomes are typically modeled with the Poisson distribution. However, these outcomes often exhibit more variance than what is allowed from the Poisson distribution, leading to the well-known problem of over-dispersion. To accommodate this over-dispersion, typically, the negative binomial distribution is used. 
 
-The following piece of code simulates data from a joint model for count longitudinal data that follow the Negative Binomial distribution:
+The following piece of code simulates data from a joint model for count longitudinal data that follow the negative binomial distribution:
 ```{r, "simulate_NB"}
 set.seed(1234)
 n <- 500 # number of subjects
@@ -414,9 +414,9 @@ summary(jointFit)
 
 
 ## Beta-binomial longitudinal outcomes
-As for count data also for Binomial data we may have the over-dispersion problem. In this case, we can change to standard Binomial distribution to a Beta-Binomial one to accommodate this.
+For count data and binomial data, we may have an over-dispersion problem. To accommodate this, we can change the standard binomial distribution to a beta-binomial one.
 
-The following piece of code simulates data from a joint model for binomial longitudinal data that follow the Beta-Binomial distribution:
+The following piece of code simulates data from a joint model for binomial longitudinal data that follow the beta-binomial distribution:
 ```{r, "simulate_BetaBinom"}
 set.seed(1234)
 n <- 500 # number of subjects
@@ -511,5 +511,3 @@ BetaBinom_MixMod <-
 jointFit <- jm(Cox_fit, BetaBinom_MixMod, time_var = "time")
 summary(jointFit)
 ```
-
-<div align="right"><a href="#top">Back to top</a></div>
diff --git a/vignettes/Recurring_Events.Rmd b/vignettes/Recurring_Events.Rmd
index a13662ff..23f9a1d0 100644
--- a/vignettes/Recurring_Events.Rmd
+++ b/vignettes/Recurring_Events.Rmd
@@ -23,9 +23,9 @@ library("JMbayes2")
 
 ## Introduction
 
-**JMbayes2** provides also the capability to fit joint models with a recurrent event process possibly combined with a terminating event. Recurrent events are correlated events that may occur more than once over the follow-up period for a given subject. Our current implementation allows for multiple longitudinal markers with different distributions, and various functional forms to link these markers with the risk of recurrent and terminal events. Furthermore, it enables the risk intervals to be defined in terms of the *gap* or *calendar* timescales. The two timescales use a different zero-time reference. The *calendar* uses the study entry, while the *gap* uses the end of the previous event (Figure 1). *Gap* assumes a renewal after each event and resets the time to zero.
+**JMbayes2** also provides the capability to fit joint models with a recurrent event process, possibly combined with a terminating event. Recurrent events are correlated events that may occur more than once over the follow-up period for a given subject. Our current implementation allows for multiple longitudinal markers with different distributions and various functional forms to link these markers with the risk of recurrent and terminal events. Furthermore, it enables the risk intervals to be defined in terms of the *gap* or *calendar* timescales. The two timescales use a different zero-time reference. The *calendar* uses the study entry, while the *gap* uses the end of the previous event (Figure 1). *Gap* assumes a renewal after each event and resets the time to zero.
 
-```{r timescale, fig.align = "center", fig.width = 5.5, echo = FALSE, fig.cap = "<b>Figure 1</b> Visual representation of an hazard function under the gap or calendar timescale. During the folow-up the subject experienced three events."}
+```{r timescale, fig.align = "center", fig.width = 5.5, echo = FALSE, fig.cap = "<b>Figure 1</b> Visual representation of an hazard function under the gap or calendar timescale. During the follow-up, the subject experienced three events."}
 col_f <- function(color, percent = 50) {
   rgb.col <- col2rgb(color)
   new.col <- rgb(rgb.col[1], rgb.col[2], rgb.col[3], max = 255,
@@ -123,7 +123,7 @@ abline(v = ev_start, lty = 2)
 
 The model also accommodates discontinuous risk intervals, i.e., periods in which the subject is not at risk of experiencing the recurring event (Figure 2). For example, while a patient is in the hospital, they are not at risk of being hospitalized again.
 
-```{r disc_risk, fig.align = "center", fig.width = 5.5, echo = FALSE, fig.cap = "<b>Figure 2</b> Visual representation of an hazard function under the gap or calendar timescale, while accounting for non-risk periods (gray areas). During the folow-up the subject experienced three events."}
+```{r disc_risk, fig.align = "center", fig.width = 5.5, echo = FALSE, fig.cap = "<b>Figure 2</b> Visual representation of an hazard function under the gap or calendar timescale, while accounting for non-risk periods (gray areas). During the follow-up, the subject experienced three events."}
 set.seed(2021)
 phi <- 0.3 # weibull scale
 sigma_t <- 0.8  # weibull shape
@@ -231,13 +231,13 @@ $$
 $$
 <br>
 
-where $i = 1, \ldots, n$. For the longitudinal outcomes we specify linear mixed-effects models, and for the terminal and recurrence processes, we use proportional hazard models. The longitudinal and event time processes are linked via a latent structure of random effects, highlighted by the same color in the equations above. The terms $\mathcal{f}_{R_j}\left\{\eta_{j_i}(t)\right\}$ and $\mathcal{f}_{R_j}\left\{\eta_{j_i}(t)\right\}$ describe the functional forms that link the longitudinal marker $j$ with the risk of the recurrent and terminal events, respectively. The frailty $b_{F_i}$ is a random effect that accounts for the correlations in the recurrent events. The coefficient $\alpha_{F}$ quantifies the strength of the association between the terminal and recurrent event processes. For notational simplicity, in the formulation presented above we have shown normally distributed longitudinal outcomes; however, **JMbayes2** provides the option to consider longitudinal outcomes with [different distributions](https://drizopoulos.github.io/JMbayes2/articles/Non_Gaussian_Mixed_Models.html).
+where $i = 1, \ldots, n$. We specify linear mixed-effects models for the longitudinal outcomes, and for the terminal and recurrence processes, we use proportional hazard models. The longitudinal and event time processes are linked via a latent structure of random effects, highlighted by the same color in the equations above. The terms $\mathcal{f}_{R_j}\left\{\eta_{j_i}(t)\right\}$ and $\mathcal{f}_{R_j}\left\{\eta_{j_i}(t)\right\}$ describe the functional forms that link the longitudinal marker $j$ with the risk of the recurrent and terminal events, respectively. The frailty $b_{F_i}$ is a random effect that accounts for the correlations in the recurrent events. The coefficient $\alpha_{F}$ quantifies the strength of the association between the terminal and recurrent event processes. For notational simplicity, in the formulation presented above, we have shown normally distributed longitudinal outcomes; however, **JMbayes2** provides the option to consider longitudinal outcomes with [different distributions](https://drizopoulos.github.io/JMbayes2/articles/Non_Gaussian_Mixed_Models.html).
 
 
 ## Example
 
 ### Data
-We simulate data from a joint model with three outcomes: one longitudinal outcome, one terminal failure-time, and one recurrent failure-time. We assume that the underlying value of the longitudinal outcome is associated with both risk models and use the *gap* timescale. The reader can easily extend this example to accommodate multiple longitudinal markers with other forms of association, include competing risks, consider only the recurrent events process, or use a different timescale.
+We simulate data from a joint model with three outcomes: one longitudinal outcome, one terminal failure time, and one recurrent failure time. We assume that the underlying value of the longitudinal outcome is associated with both risk models and use the *gap* timescale. The reader can easily extend this example to accommodate multiple longitudinal markers with other forms of association, including competing risks, considering only the recurrent events process, or using a different timescale.
 ```{r gen_data}
 gen_data <- function(){
   n <- 500 # desired number of subjects 
@@ -392,7 +392,7 @@ recu_data <- fake_data$rec # recurrent events data
 lme_data <- fake_data$long # longitudial marker data
 ```
 
-We now have three data frames, each one corresponding to a different outcome. To fit the joint model, the user needs to organize the failure-time data in the counting process formulation by combining the data for the terminal and recurrent events. Using then a strata variable to distinguish between the two processes. To facilitate this, we provide in the package the `rc_setup()` function: 
+We now have three data frames, each one corresponding to a different outcome. To fit the joint model, the user must organize the failure-time data in the counting process formulation by combining the data for the terminal and recurrent events. Then, a strata variable is used to distinguish between the two processes. To facilitate this, we provide in the package the `rc_setup()` function: 
 ```{r trf_data1}
 cox_data <- rc_setup(rc_data = recu_data, trm_data = term_data,
                      idVar = "id", statusVar = "status",
@@ -401,13 +401,13 @@ cox_data <- rc_setup(rc_data = recu_data, trm_data = term_data,
                      nameStrata = "strata", nameStatus = "status")
 ```
 
-Each subject has as many rows in the new data frame as the number of their recurrent risk periods plus one for the terminal event. The data frame follows the counting process formulation with the risk intervals delimited by `start` and `stop` variables. The `strata` variable denotes if the type of event, `1` if recurrent or `2` terminal. The `status` equals `1` if the subject had an event and `0` otherwise. As shown below and in Figure 3, the subject 1 experienced seven recurrent events during the follow up; the eighth recurrent event was censored by the terminal event. 
+Each subject has as many rows in the new data frame as the number of their recurrent risk periods plus one for the terminal event. The data frame follows the counting process formulation with the risk intervals delimited by `start` and `stop` variables. The `strata` variable denotes the type of event, `1` if recurrent, or `2` terminal. The `status` equals `1` if the subject had an event and `0` otherwise. As shown below and in Figure 3, subject 1 experienced seven recurrent events during the follow-up; the terminal event censored the eighth recurrent event. 
 
 ```{r trf_data2}
 cox_data[cox_data$id == 1, c("id", "tstart", "tstop", "status", "strata")]
 ```
 
-```{r trf_data3, echo = FALSE, fig.align = "center", fig.width = 5.5, echo = FALSE, fig.cap = "<b>Figure 3</b> Visual representation of the failure-time data during the follow-up for subject 1. The horizontal black line denotes risk periods, while the blue line non-risk periods. 'R' and 'T' represent a recurrent and terminal event, respectively."}
+```{r trf_data3, echo = FALSE, fig.align = "center", fig.width = 5.5, echo = FALSE, fig.cap = "<b>Figure 3</b> Visual representation of the failure-time data during the follow-up for subject 1. The horizontal black line denotes risk periods, while the blue line denotes non-risk periods. 'R' and 'T' represent a recurrent and terminal event, respectively."}
 data <- cox_data[cox_data$id == 1,]
 
 col_seg1 <- 1
@@ -458,7 +458,7 @@ for(i in unique(data$id)) {
 ```
 
 ### Fitting the model
-The user then needs to use the `nlme::lme()` function to first fit the linear mixed model that describes the longitudinal outcome,
+The user then needs to use the `nlme::lme()` function first to fit the linear mixed model that describes the longitudinal outcome,
 ```{r fit_lme}
 lme_fit <- lme(y ~ ns(time, k =  c(1, 3), B = c(0, 7)), 
                random = list(id = pdDiag(form = ~ ns(time, k = c(1, 3), 
@@ -473,7 +473,7 @@ cox_fit <- coxph(Surv(tstart, tstop, status) ~ (group + age):strata(strata),
                  data = cox_data)
 ```
 
-These models are then provided as arguments in the `jm()` function. The user specifies the desired functional forms for the mixed model in each relative-risk model. And with the `recurrent` argument specifies the desired timescale,
+These models are then provided as arguments in the `jm()` function. The user specifies the desired functional forms for the mixed model in each relative-risk model. And with the `recurrent` argument specifying the desired timescale,
 ```{r fit_jm}
 jm_fit <- jm(cox_fit, lme_fit, time_var = "time", recurrent = "gap",
              functional_forms =  ~ value(y):strata)
@@ -482,4 +482,3 @@ summary(jm_fit)
 ```
 
 One can find the association parameters between the underlying value of the longitudinal outcome and the recurrent and terminating event processes in the summary output as `value(y):strataRec` ($\alpha_{R_1}$) and `value(y):strataTer` ($\alpha_{T_1}$), respectively. $\exp\{\alpha_{R_1}\}$ denotes the relative increase in the risk of the next recurrent event at time $t$ that results from one unit increase in $\eta_{1_i}(t)$ since <u>the end of the previous event</u>^[This is the time reference because we are using the gap timescale. Alternatively, if we were using the calendar timescale, it would be <u>the entry time in the study</u>.]. The association parameter for the frailty term in the terminal risk model, $\alpha_{F}$, is identified in the output as `frailty`. The `sigma_frailty` refers to the frailty standard deviation, $\sigma_F$.
-
diff --git a/vignettes/Super_Learning.Rmd b/vignettes/Super_Learning.Rmd
index 9f6bc9bd..2b7a2163 100644
--- a/vignettes/Super_Learning.Rmd
+++ b/vignettes/Super_Learning.Rmd
@@ -21,9 +21,9 @@ library("JMbayes2")
 
 # Super Learning
 ## Motivation and Theory
-Joint models for longitudinal and time-to-event data have been established as a versatile tool for calculating dynamic predictions for longitudinal and survival outcomes. The advantageous feature of these predictions is that they are updated over time as extra information becomes available. As a result, they have found numerous applications in precision medicine, including cancer and cardiovascular diseases. Previous applications of joint models have considered a single model for obtaining dynamic predictions. However, finding a well-specified model can be challenging, especially when multiple longitudinal outcomes are considered. Moreover, due to the dynamic nature of these predictions, different models may provide different levels of accuracy at different follow-up times. Here we consider multiple joint models instead, and we combine the dynamic predictions from these models to optimize the predictive accuracy. We use the concept of super learning (SL) to achieve this. SL is an ensemble method that allows researchers to combine several different prediction algorithms into one. It uses $V$-fold cross-validation to build the optimally weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve.
+Joint models for longitudinal and time-to-event data have been established as a versatile tool for calculating dynamic predictions for longitudinal and survival outcomes. The advantageous feature of these predictions is that they are updated over time as extra information becomes available. As a result, they have found numerous applications in precision medicine, including cancer and cardiovascular diseases. Previous applications of joint models have considered a single model for obtaining dynamic predictions. However, finding a well-specified model can be challenging, especially considering multiple longitudinal outcomes. Moreover, due to the dynamic nature of these predictions, different models may provide different levels of accuracy at different follow-up times. Here, we consider multiple joint models instead and combine the dynamic predictions from these models to optimize the predictive accuracy. We use the concept of super learning (SL) to achieve this. SL is an ensemble method that allows researchers to combine several different prediction algorithms into one. It uses $V$-fold cross-validation to build the optimally weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve.
 
-The basic idea behind super learning is to derive model weights that optimize the cross-validated predictions. More specifically, we let $\mathcal{L} = \{M_1, \ldots, M_L\}$ denote a library with $L$ models. There are no restrictions to the models included in this library, and actually, it is recommended to consider a wide range of possible models. Among others, these joint models differ in the specification of the time trend in the longitudinal submodels and the functions form and event submodel. We split the original dataset $\mathcal{D}_n$ in $V$ folds. The choice of $V$ will depend on the size and number of events in $\mathcal{D}_n$. In particular, for each fold, we need to have a sufficient number of events to robustly quantify the predictive performance. Using the cross-validation method, we fit the $L$ models in the combined $v-1$ folds, and we will calculate predictions for the $v$-th fold we left outside. Due to the dynamic nature of the predictions, we want to derive optimal weights at different follow-up times. More specifically, we consider the sequence of time points $t_1, \ldots, t_Q$. The number and placing of these time points should again consider the available event information in $\mathcal{D}_n$. For $t_q \in \{t_1, \ldots, t_Q\}$, we define $\mathcal{R}(t_q, v)$ to denote the subjects at risk at time $t_q$ that belong to the $v$-th fold. For all subjects in $\mathcal{R}(t_q, v)$, we calculate the cross-validated predictions,
+The basic idea behind super learning is to derive model weights that optimize the cross-validated predictions. More specifically, we let $\mathcal{L} = \{M_1, \ldots, M_L\}$ denote a library with $L$ models. There are no restrictions to the models included in this library, and it is recommended to consider a wide range of possible models. Among others, these joint models differ in the specification of the time trend in the longitudinal submodels and the functions form and event submodel. We split the original dataset $\mathcal{D}_n$ in $V$ folds. The choice of $V$ will depend on the size and number of events in $\mathcal{D}_n$. In particular, for each fold, we need to have a sufficient number of events to quantify the predictive performance robustly. Using the cross-validation method, we fit the $L$ models in the combined $v-1$ folds, and we will calculate predictions for the $v$-th fold we left outside. Due to the dynamic nature of the predictions, we want to derive optimal weights at different follow-up times. More specifically, we consider the sequence of time points $t_1, \ldots, t_Q$. The number and placing of these time points should again consider the available event information in $\mathcal{D}_n$. For $t_q \in \{t_1, \ldots, t_Q\}$, we define $\mathcal{R}(t_q, v)$ to denote the subjects at risk at time $t_q$ that belong to the $v$-th fold. For all subjects in $\mathcal{R}(t_q, v)$, we calculate the cross-validated predictions,
 $$\hat{\pi}_i^{(v)}(t_q + \Delta t \mid t_q, M_l) = \Pr \{T_i^* < t_q + \Delta t \mid T_i^* > t_q, \mathcal H_i(t), M_l, \mathcal{D}_n^{(-v)}\}.$$
 These predictions are calculated based on model $M_l$ in library $\mathcal{L}$ that was fitted in the dataset $\mathcal{D}_n^{(-v)}$ the excludes the patients in the $v$-th fold. The calculation is based on a Monte Carlo approach. We define $\hat{\tilde{\pi}}_i^{v}(t_q + \Delta t \mid t_q)$ to denote the convex combination of the $L$ predictions, i.e.,
 $$\hat{\tilde{\pi}}_i^{v}(t_q + \Delta t \mid t_q) = \sum\limits_{l = 1}^L \varpi_l(t_q) \hat{\pi}_i^{(v)}(t_q + \Delta t \mid t_q, M_l), \quad \mbox{for all } v \in {1, \ldots, V},$$
@@ -53,7 +53,7 @@ CVdats <- create_folds(pbc2, V = 5, id_var = "id")
 
 The first argument for this function is the `data.frame` we wish to split in `V` folds. The argument `id_var` specifies the name of the subjects id variable in this dataset. The output of `create_folds()` is a list with two components named `"training"` and `"testing"`. Each component is another list with `V` data.frames.
 
-Next, we define the function that will fit the joint models we wish to consider for calculating predictions. This function should have as a single argument a `data.frame` that will be used to fit the joint models. To optimize computational performance we will use parallel computing to fit these models to the different training datasets. Hence, within the function we should have the call `library("JMbayes2")` to load package **JMbayes2** in each worker. The output of this function should be a list of the fitted joint models with class `"jmList"`. Assigning this class to the resulting list will facilitate combining the predictions later. For our example we use the following specification:
+Next, we define the function that will fit the joint models we wish to consider for calculating predictions. This function should have as a single argument a `data.frame` that will be used to fit the joint models. To optimize computational performance we will use parallel computing to fit these models to the different training datasets. Hence, within the function we should have the call `library("JMbayes2")` to load package **JMbayes2** in each worker. The output of this function should be a list of the fitted joint models with class `"jmList"`. Assigning this class to the resulting list will facilitate combining the predictions later. For our example, we use the following specifications:
 ```{r, "fit models function"}
 fit_models <- function (data) {
     library("JMbayes2")
@@ -82,7 +82,7 @@ fit_models <- function (data) {
 }
 ```
 
-In particular, we consider a library of univariate joint models for the longitudinal outcome `serBilir` and the composite event transplantation or death. The first three models consider a simple linear mixed effects model for serum bilirubin with random intercepts and random slopes per subject, and no other covariates. Also, in the Cox model for the composite event we do not specify any baseline covariates; hence, the risk of the composite event depends only on serum bilirubin. The three models differ in the corresponding functional forms, i.e., the current value of log serum bilirubin, the current slope/velocity of log serum bilirubin, and the current value plus the area under the log serum bilurubin trajectory. The last models consider a more elaborate specification of the linear mixed model that includes natural cubic splines in both the fixed and random effects to allow for non-linearities in the log serum bilirubin trajectories, and the main effects of sex and age in both the mixed and Cox models. The functional forms are again the current value and the current slope.
+In particular, we consider a library of univariate joint models for the longitudinal outcome `serBilir` and the composite event transplantation or death. The first three models consider a simple linear mixed effects model for serum bilirubin with random intercepts and random slopes per subject and no other covariates. Also, in the Cox model for the composite event, we do not specify any baseline covariates; hence, the risk of the composite event depends only on serum bilirubin. The three models differ in the corresponding functional forms, i.e., the current value of log serum bilirubin, the current slope/velocity of log serum bilirubin, and the current value plus the area under the log serum bilirubin trajectory. The last models consider a more elaborate specification of the linear mixed model that includes natural cubic splines in both the fixed and random effects to allow for non-linearities in the log serum bilirubin trajectories and the main effects of sex and age in both the mixed and Cox models. The functional forms are, again, the current value and the current slope.
 
 We fit these models in the training datasets using parallel computing as facilitated using the **parallel** package (*note*: this and the subsequent computations require some time to perform depending on your machine; in my machine with an Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz and 32.0 GB RAM takes about 20 min to run the whole vignette):
 ```{r, "fit the models using parallel computing"}
@@ -91,7 +91,7 @@ Models_folds <- parallel::parLapply(cl, CVdats$training, fit_models)
 parallel::stopCluster(cl)
 ```
 
-We calculate the weights to combine the predictions from these five models to optimize the integrated Brier score the expected predictive cross-entropy at follow-up time $t = 6$ years and for a relevant window of $\Delta t = 2$ years. Function `tvBrier()` automatically performs this task by providing as a first argument the list of joint models fitted in the training datasets. The integrated Brier score is calculated using the testing datasets that are provided in the `newdata` argument: 
+We calculate the weights to combine the predictions from these five models to optimize the integrated Brier score and the expected predictive cross-entropy at follow-up time $t = 6$ years and for a relevant window of $\Delta t = 2$ years. The function `tvBrier()` automatically performs this task by providing the list of joint models fitted in the training datasets as a first argument. The integrated Brier score is calculated using the testing datasets that are provided in the `newdata` argument: 
 ```{r, "calculate Brier weights"}
 tstr <- 6
 thor <- 8
@@ -101,21 +101,21 @@ Brier_weights <- tvBrier(Models_folds, newdata = CVdats$testing,
 Brier_weights
 ```
 
-We observe that the first two models dominate the weights. We also note that the integrated Brier score based on the combined predictions is smaller than the integrated Brier score of each individual model. To calculate the model weights using the expected predictive cross-entropy use function `tvEPCE()` with an almost identical call as for the Brier score:
+We observe that the first two models dominate the weights. We also note that the integrated Brier score based on the combined predictions is smaller than the integrated Brier score of each individual model. To calculate the model weights using the expected predictive cross-entropy, use function `tvEPCE()` with an almost identical call as for the Brier score:
 ```{r, "calculate EPCE weights"}
 EPCE_weights <- tvEPCE(Models_folds, newdata = CVdats$testing, 
                        Tstart = tstr, Thoriz = thor)
 EPCE_weights
 ```
 
-The EPCE results are similar with the ones from the integrated Brier score; however, now only models `M2` and `M3` share the weights. Again we see that the EPCE based on the combined cross-validated predictions is smaller than the EPCE based on the cross-validated predictions of each individual model. 
+The EPCE results are similar to those from the integrated Brier score; however, now only models `M2` and `M3` share the weights. Again, we see that the EPCE based on the combined cross-validated predictions is smaller than the EPCE based on the cross-validated predictions of each individual model. 
 
-To put these weights in practice we need first to refit the five joint models we considered in the original dataset.
+To use these weights in practice, we must first refit the five joint models we considered in the original dataset.
 ```{r, "fit the model in the original dataset", warning = FALSE}
 Models <- fit_models(pbc2)
 ```
 
-Then we construct the dataset with the subjects at risk at year six and consider the longitudinal measurements collected before this follow-up time. Also, we set in this dataset the observed event time to six, and the event variable to zero, i.e., indicating that patients were event-free up to this time:
+Then, we construct the dataset with the subjects at risk at year six and consider the longitudinal measurements collected before this follow-up time. Also, we set in this dataset the observed event time to six and the event variable to zero, i.e., indicating that patients were event-free up to this time:
 ```{r, "data for prediction"}
 ND <- pbc2[pbc2$years > tstr & pbc2$year <= tstr, ]
 ND$id <- ND$id[, drop = TRUE]
diff --git a/vignettes/Time_Varying_Effects.Rmd b/vignettes/Time_Varying_Effects.Rmd
index 9301e420..dd7d902c 100644
--- a/vignettes/Time_Varying_Effects.Rmd
+++ b/vignettes/Time_Varying_Effects.Rmd
@@ -28,7 +28,7 @@ pbc2.id$status2 <- as.numeric(pbc2.id$status != 'alive')
 CoxFit <- coxph(Surv(years, status2) ~ sex, data = pbc2.id)
 ```
 
-Our aim is to assess the strength of the association between the risk of the composite event and the level of serum bilirubin. We will describe the patient-specific profiles over time for this biomarker using a linear mixed-effects model, where both in the fixed- and random-effects we include an intercept, the linear and quadratic time effect. In the fixed-effects we also include the interaction of the time effect and sex. The syntax to fit this model with `lme()` is:
+We aim to assess the strength of the association between the risk of the composite event and the serum bilirubin level. We will describe the patient-specific profiles over time for this biomarker using a linear mixed-effects model, where we include an intercept in both the fixed and random effects, as well as the linear and quadratic time effects. In the fixed effects, we also include the interaction of the time effect and sex. The syntax to fit this model with `lme()` is:
 ```{r}
 fm <- lme(log(serBilir) ~ poly(year, 2) * sex, data = pbc2, 
           random = ~ poly(year, 2) | id, control = lmeControl(opt = 'optim'))
@@ -40,7 +40,7 @@ jointFit1 <- jm(CoxFit, fm, time_var = "year")
 summary(jointFit1)
 ```
 
-To specify that the association of serum bilirubin may change over time we include an interaction of this time-varying covariate with a natural cubic spline of time using function `ns()` from the **splines** package. **Important Note:** for this to work correctly we need to explicitly specify the internal and boundary knots for the B-splines basis, i.e., in the following example we set the internal knots at 3, 6 and 9 years, and the boundary knots at 0 and 14.5 years:
+To specify that the association of serum bilirubin may change over time, we include an interaction of this time-varying covariate with a natural cubic spline of time using function `ns()` from the **splines** package. **Important Note:** For this to work correctly, we need to explicitly specify the internal and boundary knots for the B-splines basis, i.e., in the following example, we set the internal knots at 3, 6, and 9 years, and the boundary knots at 0 and 14.5 years:
 ```{r}
 form_splines <- ~ value(log(serBilir)) * ns(year, k = c(3, 6, 9), B = c(0, 14.5))
 jointFit2 <- update(jointFit1, functional_forms = form_splines, 
@@ -48,7 +48,7 @@ jointFit2 <- update(jointFit1, functional_forms = form_splines,
 summary(jointFit2)
 ```
 
-The spline coefficients do not have a straightforward interpretation. We therefore visualize the time-varying association of log serum bilirubin with the hazard of the composite event using the following piece of code: 
+The spline coefficients do not have a straightforward interpretation. We, therefore, visualize the time-varying association of log serum bilirubin with the hazard of the composite event using the following piece of code: 
 ```{r, fig.align = "center", fig.width = 8.5, fig.height = 7.5}
 x_times <- seq(0.001, 12, length = 501)
 X <- cbind(1, ns(x_times, knots = c(3, 6, 9), B = c(0, 14.5)))
@@ -73,4 +73,4 @@ We observe that the 95% credible interval for the time-varying coefficient inclu
 compare_jm(jointFit1, jointFit2)
 ```
 
-Both the WAIC and LPML indicate that `jointFit1` is a better model than `jointFit2`. The DIC has the same magnitude for both models.
+The WAIC and LPML indicate that `jointFit1` is a better model than `jointFit2`. The DIC has the same magnitude for both models.
diff --git a/vignettes/Transformation_Functions.Rmd b/vignettes/Transformation_Functions.Rmd
index d7364555..de4e93a3 100644
--- a/vignettes/Transformation_Functions.Rmd
+++ b/vignettes/Transformation_Functions.Rmd
@@ -29,7 +29,7 @@ pbc2.id$status2 <- as.numeric(pbc2.id$status != 'alive')
 CoxFit <- coxph(Surv(years, status2) ~ sex, data = pbc2.id)
 ```
 
-Our aim is to assess the strength of the association between the risk of the composite event and whether the patients experienced hepatomegaly during follow-up. We will describe the patient-specific profiles over time for this biomarker using a mixed-effects logistic model, where both in the fixed- and random-effects we include an intercept and the time effect. The syntax to fit this model with `mixed_model()` is:
+Our aim is to assess the strength of the association between the risk of the composite event and whether the patients experienced hepatomegaly during follow-up. We will describe the patient-specific profiles over time for this biomarker using a mixed-effects logistic model, where we include an intercept and the time effect in both fixed and random effects. The syntax to fit this model with `mixed_model()` is:
 ```{r}
 fm <- mixed_model(hepatomegaly ~ year, data = pbc2, random = ~ year | id, 
                   family = binomial())
@@ -41,16 +41,16 @@ jointFit1 <- jm(CoxFit, fm, time_var = "year")
 summary(jointFit1)
 ```
 
-In the output this is named as `value(hepatomegaly)` to denote that the current value functional form is used. That is, we assume that the risk at a specific time $t$ is associated with the value of the linear predictor of the longitudinal outcome at the same time point $t$. In this particular case, the subject-specific linear predictor denotes the log odds of experiencing hepatomegaly at time $t$.
+In the output, this is named `value(hepatomegaly)` to denote that the current value functional form is used. That is, we assume that the risk at a specific time $t$ is associated with the value of the linear predictor of the longitudinal outcome at the same time point $t$. In this case, the subject-specific linear predictor denotes the log odds of experiencing hepatomegaly at time $t$.
 
 ## Transformation functions
-The fact that the default version of the current value functional form is on the linear predictor scale of the mixed model may be problematic to interpret when this linear predictor is connected with a nonlinear link function to mean of the longitudinal outcome. In these situations we may want to transform the subject-specific linear predictor back to the scale of the outcome. To achieve this we can use a transformation function. Continuing on the previous example, we update `jointFit1` by now linking the expit transformation of the linear predictor (i.e., $\mbox{expit}(x) = \exp(x) / \{1 + \exp(x)\}$) with the risk of an event. This is done using the `vexpit()` function:
+The fact that the default version of the current value functional form is on the linear predictor scale of the mixed model may be problematic to interpret when this linear predictor is connected with a nonlinear link function to the mean of the longitudinal outcome. In these situations, we may want to transform the subject-specific linear predictor back to the scale of the outcome. To achieve this, we can use a transformation function. Continuing on the previous example, we update `jointFit1` by now linking the expit transformation of the linear predictor (i.e., $\mbox{expit}(x) = \exp(x) / \{1 + \exp(x)\}$) with the risk of an event. This is done using the `vexpit()` function:
 ```{r}
 jointFit2 <- update(jointFit1, functional_forms = ~ vexpit(value(hepatomegaly)))
 summary(jointFit2)
 ```
 
-Other available functions to use in the definition of the `functional_forms` argument are `vexp()` for calculating the exponent, `vlog()` for calculating the natural logarithm, and `vsqrt()` for calculating the square root.
+Other available functions to use in the definition of the `functional_forms` argument are `vexp()` to calculate the exponent, `vlog()` to calculate the natural logarithm, and `vsqrt()` to calculate the square root.
 
 If we want to include the time-varying slope of the transformed linear predictor, we also have the `Dexpit()` and `Dexp()` functions available. As an example, we extend `jointFit2` by including the derivative of the $\mbox{expit}()$ transformation:
 ```{r}