Skip to content

Commit

Permalink
Fix #61 explain yi exclusions when can't standardize constructed vari…
Browse files Browse the repository at this point in the history
…ables
  • Loading branch information
egouldo committed Aug 5, 2024
1 parent 7c31a81 commit 17e21a7
Show file tree
Hide file tree
Showing 2 changed files with 89 additions and 1 deletion.
5 changes: 4 additions & 1 deletion index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2190,8 +2190,11 @@ We used the square of the SE associated with predicted values as the sampling va
::: {.callout-note appearance="simple"}
**Preregistration Deviation:** Because analysts of blue tit data chose different dependent variables on different scales, after transforming out-of-sample values to the original scales, we standardized all values as z scores ('standard scores') to put all dependent variables on the same scale and make them comparable.
This involved taking each relevant value on the original scale (whether a predicted point estimate or a SE associated with that estimate) and subtracting the value in question from the mean value of that dependent variable derived from the full dataset and then dividing this difference by the standard deviation, SD, corresponding to the mean from the full dataset.
This involved taking each relevant value on the original scale (whether a predicted point estimate or a SE associated with that estimate) and subtracting the value in question from the mean value of that dependent variable derived from the full dataset and then dividing this difference by the standard deviation, SD, corresponding to the mean from the full dataset (@eq-Z-VZ).
Thus, all our out-of-sample prediction values from the blue tit data are from a distribution with the mean of 0 and SD of 1.
Note that we were unable to standardise some analyst-constructed variables, so these analyses were excluded from the final out-of-sample estimates meta-analysis, see @sec-excluded-yi for details and explanation.
We did not add this step for the *Eucalyptus* data because (a) all responses were on the same scale (counts of *Eucalyptus* stems) and were thus comparable and (b) these data, with many zeros and high skew, are poorly suited for z scores.
:::
Expand Down
85 changes: 85 additions & 0 deletions supp_mat/SM2_EffectSizeAnalysis.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -638,6 +638,91 @@ ManyEcoEvo_viz %>%

### Out of sample predictions $y_i$

#### Excluded analyses with constructed variables {#sec-excluded-yi}

```{r}
#| label: excluded-constructed-yi
#| echo: false
#| message: false
# Constructed Variables Included in the ManyAnalysts meta-analysis
ManyEcoEvo_constructed_vars <-
tribble(~response_variable_name,
"euc_sdlgs_all",
"euc_sdlgs>50cm",
"euc_sdlgs0_2m",
"small*0.25+medium*1.25+large*2.5",
"euc_sdlgs50cm_2m",
"average.proportion.of.plots.containing.at.least.one.euc.seedling.of.any.size",
"day_14_weight/(day_14_tarsus_length^2)",
"day_14_weight/day_14_tarsus_length",
"day_14_weight*day_14_tarsus_length"
)
# Analyst Constructed Variables
all_constructed_vars <-
ManyEcoEvo %>%
pull(data, dataset) %>%
list_rbind(names_to = "dataset") %>%
filter(str_detect(response_variable_type, "constructed")) %>%
distinct(dataset,response_variable_name) %>%
drop_na() %>%
arrange()
by <- join_by(response_variable_name)
excluded_yi_constructed <-
ManyEcoEvo %>%
pull(data, dataset) %>%
list_rbind(names_to = "dataset") %>%
filter(str_detect(response_variable_type, "constructed")) %>%
distinct(dataset, id_col, TeamIdentifier, response_variable_name) %>%
drop_na() %>%
anti_join(ManyEcoEvo_constructed_vars, by)
n_dropped_analyses <-
excluded_yi_constructed %>%
n_distinct("id_col")
n_teams_w_dropped_analyses <-
excluded_yi_constructed %>%
group_by(TeamIdentifier) %>%
count() %>%
n_distinct("TeamIdentifier")
```

We standardized the $y_i$ estimates for the blue tit analyses using the population mean and standard deviations of the relevant response variable for that analysis as shown in @eq-Z-VZ using the function `ManyEcoEvo::Z_VZ_preds()`. We used the mean and standard deviation of the relevant dataset from the full analysis set as our 'population' parameters.

$$
Z_i = \frac{\hat{y}_i - \mu}{\text{SD}} \\
{\text{VAR}}_{Z_i} = \frac{{SE}_{\hat{y}_i}}{{SD}}
$$ {#eq-Z-VZ}
For some analyses of the blue tit dataset, analysts constructed their own unique response variables, which meant we needed to also construct these variables in order to calculate the population parameters. Unfortunately we were not able to re-construct all variables used by the analysts, as we were unable to reproduce the exact dataset required for their re-construction. Included and excluded constructed variables are illustrated in @tbl-constructed-var-exclusions. A total of `r n_dropped_analyses` were excluded from out-of-sample meta-analysis, from `r n_teams_w_dropped_analyses`, including the following analysis identifiers: `r pull(excluded_yi_constructed, id_col) %>% gluedown::md_code() %>% glue::glue_collapse(", ",last = " and ")`.
```{r}
#| label: tbl-constructed-var-exclusions
all_constructed_vars %>%
semi_join(ManyEcoEvo_constructed_vars, by) %>%
mutate(included_in_yi = TRUE) %>%
bind_rows(
{
all_constructed_vars %>%
anti_join(ManyEcoEvo_constructed_vars, by) %>%
mutate(included_in_yi = FALSE)
}
) %>%
dplyr::filter(dataset != "eucalyptus") %>% # not excluded as standardisation not needed
dplyr::mutate(included_in_yi = ifelse(isTRUE(included_in_yi),"check", "xmark")) %>%
gt::gt() %>%
gt::cols_label(response_variable_name = "Constructed Variable",
included_in_yi = gt::md("Included in $y\\_i$ meta-analysis?")) %>%
gtExtras::gt_fa_column(included_in_yi) %>%
gt::cols_hide("dataset")
```
#### Non-truncated $y_{i}$ meta-analysis forest plot
Below is the non-truncated version of @fig-euc-yi-forest-plot showing a forest plot of the out-of-sample predictions, $y_{i}$, on the response-scale (stems counts), for *Eucalyptus* analyses, showing the full error bars of all model estimates.
Expand Down

0 comments on commit 17e21a7

Please sign in to comment.