Fix #61 explain yi exclusions when can't standardize constructed vari…

…ables
egouldo · Aug 5, 2024 · 17e21a7 · 17e21a7
1 parent 7c31a81
commit 17e21a7
Show file tree

Hide file tree

Showing 2 changed files with 89 additions and 1 deletion.
diff --git a/index.qmd b/index.qmd
@@ -2190,8 +2190,11 @@ We used the square of the SE associated with predicted values as the sampling va
 
 ::: {.callout-note appearance="simple"}
 **Preregistration Deviation:** Because analysts of blue tit data chose different dependent variables on different scales, after transforming out-of-sample values to the original scales, we standardized all values as z scores ('standard scores') to put all dependent variables on the same scale and make them comparable.
-This involved taking each relevant value on the original scale (whether a predicted point estimate or a SE associated with that estimate) and subtracting the value in question from the mean value of that dependent variable derived from the full dataset and then dividing this difference by the standard deviation, SD, corresponding to the mean from the full dataset.
+This involved taking each relevant value on the original scale (whether a predicted point estimate or a SE associated with that estimate) and subtracting the value in question from the mean value of that dependent variable derived from the full dataset and then dividing this difference by the standard deviation, SD, corresponding to the mean from the full dataset (@eq-Z-VZ).
 Thus, all our out-of-sample prediction values from the blue tit data are from a distribution with the mean of 0 and SD of 1.
+
+Note that we were unable to standardise some analyst-constructed variables, so these analyses were excluded from the final out-of-sample estimates meta-analysis, see @sec-excluded-yi for details and explanation.
+
 We did not add this step for the *Eucalyptus* data because (a) all responses were on the same scale (counts of *Eucalyptus* stems) and were thus comparable and (b) these data, with many zeros and high skew, are poorly suited for z scores.
 :::
 

diff --git a/supp_mat/SM2_EffectSizeAnalysis.qmd b/supp_mat/SM2_EffectSizeAnalysis.qmd
@@ -638,6 +638,91 @@ ManyEcoEvo_viz %>%
 
 ### Out of sample predictions $y_i$
 
+#### Excluded analyses with constructed variables {#sec-excluded-yi}
+
+```{r}
+#| label: excluded-constructed-yi
+#| echo: false
+#| message: false
+
+# Constructed Variables Included in the ManyAnalysts meta-analysis
+ManyEcoEvo_constructed_vars <-
+  tribble(~response_variable_name,
+          "euc_sdlgs_all",
+          "euc_sdlgs>50cm",
+          "euc_sdlgs0_2m",
+          "small*0.25+medium*1.25+large*2.5",
+          "euc_sdlgs50cm_2m",
+          "average.proportion.of.plots.containing.at.least.one.euc.seedling.of.any.size",
+          "day_14_weight/(day_14_tarsus_length^2)",
+          "day_14_weight/day_14_tarsus_length",
+          "day_14_weight*day_14_tarsus_length"
+  )
+
+# Analyst Constructed Variables
+all_constructed_vars <- 
+  ManyEcoEvo %>% 
+    pull(data, dataset) %>% 
+    list_rbind(names_to = "dataset") %>% 
+    filter(str_detect(response_variable_type, "constructed")) %>% 
+    distinct(dataset,response_variable_name) %>% 
+    drop_na() %>% 
+    arrange()
+
+by <- join_by(response_variable_name)
+
+excluded_yi_constructed <- 
+  ManyEcoEvo %>% 
+  pull(data, dataset) %>% 
+  list_rbind(names_to = "dataset") %>% 
+  filter(str_detect(response_variable_type, "constructed")) %>% 
+  distinct(dataset, id_col, TeamIdentifier, response_variable_name) %>% 
+  drop_na() %>% 
+  anti_join(ManyEcoEvo_constructed_vars, by)
+
+n_dropped_analyses <- 
+  excluded_yi_constructed %>% 
+  n_distinct("id_col")
+
+n_teams_w_dropped_analyses <- 
+  excluded_yi_constructed %>% 
+  group_by(TeamIdentifier) %>%  
+  count() %>% 
+  n_distinct("TeamIdentifier")
+```
+
+We standardized the $y_i$ estimates for the blue tit analyses using the population mean and standard deviations of the relevant response variable for that analysis as shown in @eq-Z-VZ using the function `ManyEcoEvo::Z_VZ_preds()`. We used the mean and standard deviation of the relevant dataset from the full analysis set as our 'population' parameters.
+
+$$
+Z_i = \frac{\hat{y}_i - \mu}{\text{SD}} \\
+{\text{VAR}}_{Z_i} = \frac{{SE}_{\hat{y}_i}}{{SD}}
+$$ {#eq-Z-VZ}
+
+For some analyses of the blue tit dataset, analysts constructed their own unique response variables, which meant we needed to also construct these variables in order to calculate the population parameters. Unfortunately we were not able to re-construct all variables used by the analysts, as we were unable to reproduce the exact dataset required for their re-construction. Included and excluded constructed variables are illustrated in @tbl-constructed-var-exclusions. A total of `r n_dropped_analyses` were excluded from out-of-sample meta-analysis, from `r n_teams_w_dropped_analyses`, including the following analysis identifiers: `r pull(excluded_yi_constructed, id_col) %>% gluedown::md_code() %>% glue::glue_collapse(", ",last = " and ")`.
+
+```{r}
+#| label: tbl-constructed-var-exclusions
+
+all_constructed_vars %>% 
+  semi_join(ManyEcoEvo_constructed_vars, by) %>% 
+  mutate(included_in_yi = TRUE) %>% 
+  bind_rows(
+    {
+      all_constructed_vars %>% 
+        anti_join(ManyEcoEvo_constructed_vars, by) %>% 
+        mutate(included_in_yi = FALSE)
+    }
+  ) %>% 
+  dplyr::filter(dataset != "eucalyptus") %>% # not excluded as standardisation not needed
+  dplyr::mutate(included_in_yi = ifelse(isTRUE(included_in_yi),"check", "xmark")) %>% 
+  gt::gt() %>% 
+  gt::cols_label(response_variable_name = "Constructed Variable",
+                 included_in_yi = gt::md("Included in $y\\_i$ meta-analysis?")) %>% 
+  gtExtras::gt_fa_column(included_in_yi) %>% 
+  gt::cols_hide("dataset")
+
+```
+
 #### Non-truncated $y_{i}$ meta-analysis forest plot
 
 Below is the non-truncated version of @fig-euc-yi-forest-plot showing a forest plot of the out-of-sample predictions, $y_{i}$, on the response-scale (stems counts), for *Eucalyptus* analyses, showing the full error bars of all model estimates.