read_google_mobility function crashing R 4.3 #8

sckott · 2023-04-25T23:02:37Z

So I'm working with @aedobbyn who's contributed here too, and we just ran into a problem with this package I think because of transitioning to R 4.3?, I'm not totally sure yet what caused this, but some workflow we have that uses this package just started to fail. The problem I found is in the covid19mobility::read_google_mobility() function. In R 4.2 it seems to run fine, but in R 4.3 it fails in the call to tidyr::pivot_longer at https://github.com/Covid19R/covid19mobility/blob/master/R/refresh_covid19mobility_google.R#L292-L319

url <- "https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv"

gmob <- readr::read_csv(url,
  na = c("", "N/A"), # col_types = "ccccDiiiiii")
  col_types = readr::cols(
    country_region_code = readr::col_character(),
    country_region = readr::col_character(),
    sub_region_1 = readr::col_character(),
    sub_region_2 = readr::col_character(),
    iso_3166_2_code = readr::col_character(),
    census_fips_code = readr::col_character(),
    date = readr::col_date(),
    retail_and_recreation_percent_change_from_baseline = readr::col_integer(),
    grocery_and_pharmacy_percent_change_from_baseline = readr::col_integer(),
    parks_percent_change_from_baseline = readr::col_integer(),
    transit_stations_percent_change_from_baseline = readr::col_integer(),
    workplaces_percent_change_from_baseline = readr::col_integer(),
    residential_percent_change_from_baseline = readr::col_integer()
  )
)

gmob_longer <- gmob %>%
  dplyr::rename(location_code = country_region_code) %>%
  tidyr::pivot_longer( # HERE'S THE FAILURE
    cols = retail_and_recreation_percent_change_from_baseline:residential_percent_change_from_baseline,
    names_to = "data_type",
    values_to = "value"
  )

The readr::read_csv call works fine to create the ~11 million row data.frame, but then the tidyr::pivot_longer fails and crashes R. It runs fine when there's some smaller set of rows passed to it, e.g., 2 million, or 5 million, etc. But with I found ~9 million or more rows, then tidyr::pivot_longer kills R.

I haven't yet dug into a debugger, so I don't know if this is a tidyr issue or something below it.

Since it seems like it's about the size of the data, perhaps one could chunk the data and pass smaller chunks of data to tidyr::pivot_longer and then recombine them?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_google_mobility function crashing R 4.3 #8

read_google_mobility function crashing R 4.3 #8

sckott commented Apr 25, 2023

read_google_mobility function crashing R 4.3 #8

read_google_mobility function crashing R 4.3 #8

Comments

sckott commented Apr 25, 2023