Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_google_mobility function crashing R 4.3 #8

Open
sckott opened this issue Apr 25, 2023 · 0 comments
Open

read_google_mobility function crashing R 4.3 #8

sckott opened this issue Apr 25, 2023 · 0 comments

Comments

@sckott
Copy link

sckott commented Apr 25, 2023

👋🏽 @jebyrnes

So I'm working with @aedobbyn who's contributed here too, and we just ran into a problem with this package I think because of transitioning to R 4.3?, I'm not totally sure yet what caused this, but some workflow we have that uses this package just started to fail. The problem I found is in the covid19mobility::read_google_mobility() function. In R 4.2 it seems to run fine, but in R 4.3 it fails in the call to tidyr::pivot_longer at https://github.com/Covid19R/covid19mobility/blob/master/R/refresh_covid19mobility_google.R#L292-L319

url <- "https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv"

gmob <- readr::read_csv(url,
  na = c("", "N/A"), # col_types = "ccccDiiiiii")
  col_types = readr::cols(
    country_region_code = readr::col_character(),
    country_region = readr::col_character(),
    sub_region_1 = readr::col_character(),
    sub_region_2 = readr::col_character(),
    iso_3166_2_code = readr::col_character(),
    census_fips_code = readr::col_character(),
    date = readr::col_date(),
    retail_and_recreation_percent_change_from_baseline = readr::col_integer(),
    grocery_and_pharmacy_percent_change_from_baseline = readr::col_integer(),
    parks_percent_change_from_baseline = readr::col_integer(),
    transit_stations_percent_change_from_baseline = readr::col_integer(),
    workplaces_percent_change_from_baseline = readr::col_integer(),
    residential_percent_change_from_baseline = readr::col_integer()
  )
)

gmob_longer <- gmob %>%
  dplyr::rename(location_code = country_region_code) %>%
  tidyr::pivot_longer( # HERE'S THE FAILURE
    cols = retail_and_recreation_percent_change_from_baseline:residential_percent_change_from_baseline,
    names_to = "data_type",
    values_to = "value"
  )

The readr::read_csv call works fine to create the ~11 million row data.frame, but then the tidyr::pivot_longer fails and crashes R. It runs fine when there's some smaller set of rows passed to it, e.g., 2 million, or 5 million, etc. But with I found ~9 million or more rows, then tidyr::pivot_longer kills R.

I haven't yet dug into a debugger, so I don't know if this is a tidyr issue or something below it.

Since it seems like it's about the size of the data, perhaps one could chunk the data and pass smaller chunks of data to tidyr::pivot_longer and then recombine them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant