Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canned median quantile #435

Merged
merged 12 commits into from
Feb 8, 2025
Merged

Canned median quantile #435

merged 12 commits into from
Feb 8, 2025

Conversation

dsweber2
Copy link
Contributor

@dsweber2 dsweber2 commented Feb 7, 2025

Checklist

Please:

  • Make sure this PR is against "dev", not "main".
  • Request a review from one of the current epipredict main reviewers:
    dajmcdon.
  • Make sure to bump the version number in DESCRIPTION and NEWS.md.
    Always increment the patch version number (the third number), unless you are
    making a release PR from dev to main, in which case increment the minor
    version number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    0.7.2, then write your changes under the 0.8 heading).
  • Consider pinning the epiprocess version in the DESCRIPTION file if
    • You anticipate breaking changes in epiprocess soon
    • You want to co-develop features in epipredict and epiprocess

Change explanations for reviewer

This is a minor patch to add the median as a fit quantile as a default, which deals with some problems caught in #431. The median can change radically after thresholding if there are too few quantiles, so this fixes the median by making sure the median is also fit and included in the distribution. It doesn't solve extrapolation, which is left open as an issue.

Edit:
this is a patch more or less, with a more principled fix discussed in #434

@dsweber2 dsweber2 requested a review from dajmcdon as a code owner February 7, 2025 19:15
@dsweber2 dsweber2 self-assigned this Feb 7, 2025
@dajmcdon
Copy link
Contributor

dajmcdon commented Feb 7, 2025

It's not entirely obvious to me that:

  1. The bug shown is fixed by this change.
  2. This change doesn't introduce other unexpected behaviour.

Can you explain a bit why this should result in the intended behaviour? It seems to me that there are at least two things going on:

  1. Autplot has a default with a 95% interval, but flatline doesn't have the same default (neither do other canned forecasters). This means that it tries to extrapolate them (by default), but the extrapolation can go negative (possibly suggesting that a fix should be made to the extrapolation implementation).
  2. The flat point forecast clearly is much different from the centre of the plotted intervals. I think the intervals there are the problem.

@dajmcdon
Copy link
Contributor

dajmcdon commented Feb 7, 2025

FYI: there's a missing dplyr:: on line 186 of autoplot.R

R/autoplot.R Outdated Show resolved Hide resolved
Co-authored-by: Dmitry Shemetov <[email protected]>
@dsweber2
Copy link
Contributor Author

dsweber2 commented Feb 7, 2025

You're right that it's the quantiles that are wrong. The problem is the interpolated quantiles get wrecked when one of the two quantiles gets adjusted by thresholding, making the median of the quantiles no longer equal to the point prediction.

This is a fix for the median of the quantiles not being equal to the point prediction. It doesn't fix the fact that there are negative extrapolated quantiles, I'll add your suggestion for that in a minute.

compare

the case with only 2 extreme quantiles

forecast_date <- as.Date("2021-08-01")
used_locations <- c("ca", "ma", "ny", "tx")
all_flatlines <- lapply(
  seq(0, 28, by = 7),
  \(days_ahead) {
    flatline_forecaster(
      covid_case_death_rates |>
        filter(time_value <= forecast_date, geo_value %in% used_locations),
      outcome = "death_rate",
      args_list = flatline_args_list(
        ahead = days_ahead,
        quantile_levels = c(0.05, 0.95)
      )
    )
  }
)
# same plotting code as in the arx multi-ahead case
workflow <- all_flatlines[[1]]$epi_workflow
results <- purrr::map_df(all_flatlines, ~ `$`(., "predictions"))
results %>% filter(target_date == max(target_date))
autoplot(
  object = workflow,
  predictions = results
)

ex2

the case with a median quantile

forecast_date <- as.Date("2021-08-01")
used_locations <- c("ca", "ma", "ny", "tx")
all_flatlines <- lapply(
  seq(0, 28, by = 7),
  \(days_ahead) {
    flatline_forecaster(
      covid_case_death_rates |>
        filter(time_value <= forecast_date, geo_value %in% used_locations),
      outcome = "death_rate",
      args_list = flatline_args_list(
        ahead = days_ahead,
        quantile_levels = c(0.05, 0.95)
      )
    )
  }
)
# same plotting code as in the arx multi-ahead case
workflow <- all_flatlines[[1]]$epi_workflow
results <- purrr::map_df(all_flatlines, ~ `$`(., "predictions"))
results %>% filter(target_date == max(target_date))
autoplot(
  object = workflow,
  predictions = results
)

with the same but quantile_levels = c(0.05, 0.5, 0.95)
ex1

@dsweber2 dsweber2 merged commit 7a103fa into dev Feb 8, 2025
2 checks passed
@dshemetov dshemetov deleted the canned_median_quantile branch February 8, 2025 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants