You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with pandas-gbq=0.15 and reading it back with dask_bigquery.read_gbqreturns 2 dask partitions, while if the writing is done withpandas-gbq=0.16when reading back withdask_bigquery.read_gbq` returns only 1 dask partitions.
pandas-gbq 0.16 changed the default intermediate data serialization format to parquet instead of CSV.
Likely this means the backend loader required fewer workers and wrote it to fewer files behind the scenes
Short term solution: pin pandas-gbq <= 0.15 or avoid asserting for ddf.npartitions
It looks like the most recent update on pandas-gbq might have broken our tests. When writing to bigquery this
with pandas-gbq=0.15 and reading it back with dask_bigquery.read_gbqreturns 2 dask partitions, while if the writing is done withpandas-gbq=0.16when reading back withdask_bigquery.read_gbq` returns only 1 dask partitions.
From the discussion on #11 we know that
pandas-gbq <= 0.15
or avoid asserting forddf.npartitions
pandas-gbq
and usebigquery.Client.load_table_from_dataframe
or something like this https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#loading_csv_data_into_a_table_that_uses_column-based_time_partitioningThe text was updated successfully, but these errors were encountered: