Skip to content

Commit

Permalink
#80 Add quarto markdown explaining package data generation
Browse files Browse the repository at this point in the history
And explain implicit exclusions from yi meta-analysis
  • Loading branch information
egouldo committed Aug 5, 2024
1 parent e6f9e82 commit a3edfa4
Showing 1 changed file with 74 additions and 0 deletions.
74 changes: 74 additions & 0 deletions data-raw/analysis_datasets/README.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
title: "README: `data-raw/analysis_datasets`"
format: gfm
---

This directory contains the analysis datasets used by analysis teams to answer the project's research questions.

- `blue_tit_data_updated_2020-04-18.csv` contains the Evolutionary Ecology or Blue tit dataset, stored at <https://osf.io/34fzc/>.
- `Euc_data.csv` contains the Ecology and Conservation or Eucalyptus dataset, stored at <https://osf.io/t76uy/>.

The two files are downloaded to this directory using the script `osf_load_analyst_datasets.R`, which uses the [`osfr::` package](https://github.com/ropensci/osfr).

For assistance with setting up authentication to run this script, check out <https://cran.r-project.org/web/packages/osfr/vignettes/auth.html>. Alternatively, simply download the files directly from the OSF using the URL's above. The script downloads the files, loads them into R, and then calculates some additional variables, which were variables constructed by analysts, but not present in the supplied analysis datasets.

This script creates the package data `euc_data` and `blue_tit_data`.

## A note on additional variable construction in `osf_load_analyst_datasets.R` and potential for implicit exclusion

Note that for inclusion in the out-of-sample predictions $y_i$ meta-analysis, parameter tables including the mean and SD for each response variable are needed standardise the data. If the variable is not present in either `euc_data` or `blue_tit_data` (i.e. it isn't calculated in `osf_load_analyst_datasets.R`, nor present in the original data files downloaded from the OSF), then analyses using that response variable are excluded from further meta-analysis. See table below for constructed variables that are included/excluded in the out-of-sample predictions meta-analysis:

```{R}
#| label: tbl-constructed-variables
#| results: asis
#| echo: true
#| message: false
#| code-fold: true
library(ManyEcoEvo)
library(dplyr)
library(purrr)
library(stringr)
library(tidyr)
library(tibble)
# Constructed Variables Included in the ManyAnalysts meta-analysis
ManyEcoEvo_constructed_vars <-
tribble(~response_variable_name,
"euc_sdlgs_all",
"euc_sdlgs>50cm",
"euc_sdlgs0_2m",
"small*0.25+medium*1.25+large*2.5",
"euc_sdlgs50cm_2m",
"average.proportion.of.plots.containing.at.least.one.euc.seedling.of.any.size",
"day_14_weight/(day_14_tarsus_length^2)",
"day_14_weight/day_14_tarsus_length",
"day_14_weight*day_14_tarsus_length"
)
# Analyst Constructed Variables
all_constructed_vars <-
ManyEcoEvo %>%
pull(data, dataset) %>%
list_rbind(names_to = "dataset") %>%
filter(str_detect(response_variable_type, "constructed")) %>%
distinct(response_variable_name) %>%
drop_na() %>%
arrange()
by <- join_by(response_variable_name)
all_constructed_vars %>%
semi_join(ManyEcoEvo_constructed_vars, by) %>%
mutate(included_in_yi = TRUE) %>%
bind_rows(
{
all_constructed_vars %>%
anti_join(ManyEcoEvo_constructed_vars, by) %>%
mutate(included_in_yi = FALSE)
}
) %>%
knitr::kable(col.names = c("Constructed Variable",
"Included in $y_i$ meta-analysis?"),
format = "markdown")
```

0 comments on commit a3edfa4

Please sign in to comment.