pandas-gbq 0.16 release broke dask-bigquery CI #24

ncclementi · 2021-11-19T18:04:51Z

It looks like the most recent update on pandas-gbq might have broken our tests. When writing to bigquery this

pd.DataFrame.to_gbq(
        df,
        destination_table=f"{dataset_id}.{table_id}",
        project_id=project_id,
        chunksize=5,
        if_exists="append",
    )

with pandas-gbq=0.15 and reading it back with dask_bigquery.read_gbqreturns 2 dask partitions, while if the writing is done withpandas-gbq=0.16when reading back withdask_bigquery.read_gbq` returns only 1 dask partitions.

From the discussion on #11 we know that

pandas-gbq 0.16 changed the default intermediate data serialization format to parquet instead of CSV.
Likely this means the backend loader required fewer workers and wrote it to fewer files behind the scenes

Short term solution: pin pandas-gbq <= 0.15 or avoid asserting for ddf.npartitions
Long term solution: Avoid using pandas-gbq and use bigquery.Client.load_table_from_dataframe or something like this https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#loading_csv_data_into_a_table_that_uses_column-based_time_partitioning

The text was updated successfully, but these errors were encountered:

ncclementi self-assigned this Nov 19, 2021

ncclementi mentioned this issue Nov 19, 2021

pin pandas-gbq to fix CI #25

Merged

fjetter added the bug Something isn't working label Nov 29, 2021

ncclementi mentioned this issue Dec 14, 2021

Remove pandas-gbq from testing #31

Merged

1 task

jrbourbeau closed this as completed in #31 Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas-gbq 0.16 release broke dask-bigquery CI #24

pandas-gbq 0.16 release broke dask-bigquery CI #24

ncclementi commented Nov 19, 2021 •

edited

Loading

pandas-gbq 0.16 release broke dask-bigquery CI #24

pandas-gbq 0.16 release broke dask-bigquery CI #24

Comments

ncclementi commented Nov 19, 2021 • edited Loading

ncclementi commented Nov 19, 2021 •

edited

Loading