Using pointblank for panel / repeated measures data #297
Replies: 2 comments 2 replies
-
Hi Emily, thanks for bringing this up. I sketched out some code/ideas on how to solve but ran up against some dead ends. The problems are fixable but I think it would be great to share the code and see which direction might be best. Here's the exploratory R script: # devtools::install_github("rich-iannone/intendo")
library(intendo)
library(pointblank)
library(tidyverse)
# Let's take `intendo::sj_all_revenue` and modify it so that Germany didn't
# come online until 2015-02-01
all_revenue_modified <-
intendo::sj_all_revenue %>%
filter(!(country == "Germany" &
session_start < lubridate::ymd_hms("2015-02-01 00:00:00"))) %>%
filter(country %in% c("United States", "Canada", "Germany", "Spain"))
# New function in pointblank stores table-prep formulas; it's low on features
# right now but you can at least materialize the table by stored name (LHS of
# formula) or get the table-prep formulas (used in the `read_fn` arg)
tbls <-
tbl_store(
all_revenue ~ all_revenue_modified,
all_revenue_us ~ all_revenue_modified %>% dplyr::filter(country == "United States"),
all_revenue_ca ~ all_revenue_modified %>% dplyr::filter(country == "Canada"),
all_revenue_de ~ all_revenue_modified %>% dplyr::filter(country == "Germany"),
all_revenue_es ~ all_revenue_modified %>% dplyr::filter(country == "Spain")
)
# Note on the above: it would be nice to refer to the name of a
# table that is being mutated, like this
#' tbls <-
#' tbl_store(
#' all_revenue ~ all_revenue_modified,
#' all_revenue_us ~ ..all_revenue.. %>% dplyr::filter(country == "United States"),
#' ...
#' )
# Then the `tbl_store` function could just dynamically complete the sequence at
# request time (i.e., through `tbl_get()` or `tbl_source`)
# Materializing tables from `tbl_get()` calls
tbl_get("all_revenue", tbls)
tbl_get("all_revenue_us", tbls)
tbl_get("all_revenue_ca", tbls)
tbl_get("all_revenue_de", tbls)
tbl_get("all_revenue_es", tbls)
# Extracting the table-prep formulas
tbl_source("all_revenue", tbls)
tbl_source("all_revenue_us", tbls)
tbl_source("all_revenue_ca", tbls)
tbl_source("all_revenue_de", tbls)
tbl_source("all_revenue_es", tbls)
# Underneath, these are two-sided formulas
unclass(tbl_source("all_revenue", tbls)) # `all_revenue ~ all_revenue_modified`
rlang::is_formula(unclass(tbl_source("all_revenue", tbls))) # TRUE
# Here's a plot that shows us that sessions in Germany begin in February
ggplot(all_revenue_modified) +
geom_point(aes(x = session_start, y = item_revenue), alpha = 0.25) +
facet_wrap(~country)
# Validate by group; I want this to work, but, it doesn't because
# `preconditions` only accepts a functional sequence (`. %>% filter(...)`)
# It would be great if this did work though
create_agent(
read_fn = ~ intendo::sj_all_revenue,
tbl_name = "all_revenue",
label = "All Revenue for 2015",
actions = action_levels(warn_at = 0.01, stop_at = 0.05)
) %>%
col_vals_gte(
"session_start", lubridate::ymd_hms("2015-01-01 00:00:00"),
preconditions = tbl_source("all_revenue_us", tbls)
) %>%
col_vals_gte(
"session_start", lubridate::ymd_hms("2015-01-01 00:00:00"),
preconditions = tbl_source("all_revenue_ca", tbls)
) %>%
col_vals_gte(
"session_start", lubridate::ymd_hms("2015-02-01 00:00:00"),
preconditions = tbl_source("all_revenue_de", tbls)
) %>%
col_vals_gte(
"session_start", lubridate::ymd_hms("2015-02-01 00:00:00"),
preconditions = tbl_source("all_revenue_es", tbls)
) %>%
interrogate() Some of my ideas (all requiring some changes, but nothing too substantial) are to: (1) improve I'll continue to explore these. Maybe the third idea might be an acceptable workaround (or the best solution? have to try it!). I think this is an important use case to solve for, so, if any development is required I'd be happy to take that on. |
Beta Was this translation helpful? Give feedback.
-
Thanks for all of the thoughts on this @rich-iannone ! I think your example makes a really good point that there are a lot of different potential uses for groups:
Even if all of those are implemented, it's good to realize how many different ways people could interpret a "group" function! I do like the idea of using I'll open a general issue to keep discussing more! |
Beta Was this translation helpful? Give feedback.
-
I'm curious what strategies people use when using pointblank with panel or repeated measures data? For example, if one is analyzing data for a set of customers, it could be useful to specify checks within observations, such as:
pointblank
has great strategies for doing these checks for a full dataset (or, equivalently, a single group), but I cannot think of a good approach for running them for a large number of groups besides using anest-map
paradigm and creating a separate agent by group. However, this feels inelegant, challenging to combine into a single final report, and harder to implement on remote data sources (e.g. running against the database)Thanks for any thoughts!
Beta Was this translation helpful? Give feedback.
All reactions