-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-640837: pd_writer
regression with Pandas 1.3 -> 1.4
#1220
Comments
pd_writer
regression with Pandas 1.3 -> 1.4pd_writer
regression with Pandas 1.3 -> 1.4
Any news on this? |
Bump |
Something curious happens here, if you use a new connection the result is empty (even without temporary tables) import pandas as pd
from functools import partial
from snowflake.connector.pandas_tools import pd_writer
import logging
logging.basicConfig(level="INFO")
TABLE = "xxx"
SCHEMA = "yyy"
with get_sqlalchemy_engine().connect() as conn:
conn.execute(f"USE SCHEMA {SCHEMA}") # This is needed for pd_writer to work
pd.DataFrame({"a": [42]}).to_sql(
TABLE,
schema=SCHEMA,
con=conn,
index=False,
if_exists="append",
# see https://github.com/snowflakedb/snowflake-connector-python/issues/513
# Uncommenting this works as well
method=partial(pd_writer, quote_identifiers=False),
)
# This works:
df = pd.read_sql_table(TABLE, conn, schema=SCHEMA)
assert len(df) == 1
with get_sqlalchemy_engine().connect() as conn:
conn.execute(f"USE SCHEMA {SCHEMA}") # This is needed for pd_writer to work
# But this doesn't:
df = pd.read_sql_table(TABLE, conn, schema=SCHEMA)
assert len(df) == 1 |
thanks for reaching out and giving us the example, we will take a look into this |
hey @jonashaag , I feel this is a regression in the method Can you verify that |
Well the first
|
thanks for the update. what does it mean by "breaks it"? does it mean no data is inserted or something else in the meanwhile, I'm wondering if it's possible for you to do some experiment with sqlite db and see if this is a general pandas issue, if so we can create an issue in the pandas repo. |
Can you reproduce the issue? If so, why do you need me to do additional work? |
apologize if there's any miscommunication, I thought you would be interested in the pandas behaviors. |
I just want to call out that this is still on our radar and we will keep investigation. |
Thank you. I wonder why takes almost a year to fix such a fundamental problem. |
Hello, is anyone from Snowflake looking into this? |
hey @jonashaag , I'm going to dedicate some time today to dive into the code base and post updates for you, I know this has been hanging for a while, thanks for your patience. |
I did some more investigation, the way how connector saves pandas dataframe into a table is:
I found that with pandas 1.3, copy command can load data into the table, however, with 1.4, though copy command query returns successfully, but no dada is loaded into the table. I verified the behavior with a non-temp table: with engine.connect() as conn:
conn.execute(f"CREATE OR REPLACE TABLE {SCHEMA}.{TABLE} (a int)")
conn.execute(f"USE SCHEMA {SCHEMA}") # This is needed for pd_writer to work
pd.DataFrame({"a": [42]}).to_sql(
TABLE,
schema=SCHEMA,
con=conn,
index=False,
if_exists="append",
# see https://github.com/snowflakedb/snowflake-connector-python/issues/513
# Uncommenting this works as well
method=partial(pd_writer, quote_identifiers=False),
) I also tried to verify the parquet file generated by pandas 1.4.4 -- if I manually copy the generated parquet into a table, the table will have the data. for now I don't have clue on why copy command executed in connector with pandas 1.4.4 returns success but no data is loaded, will keep investigation. |
Thanks! Note that this is also the case with all other pandas versions. |
Should we using the same quoting workaround here as mention in #1420? |
I don't think it's due to the quoting, the parquet file is good, I can manually copy the data into table. so I don't think it's related to quoted identifier. actually I found something very interesting when checking the log around the in pandas 1.3, I saw a COMMIT right after the DEBUG:snowflake.connector.cursor:running query [COPY INTO ... /* Python:snowflake.connector.pandas_tools.writ...]
...
DEBUG:snowflake.connector.cursor:binding: [COMMIT] with input=[None], processed=[{}]
INFO:snowflake.connector.cursor:query: [COMMIT]
DEBUG:snowflake.connector.connection:sequence counter: 8
DEBUG:snowflake.connector.cursor:Request id: xxx
DEBUG:snowflake.connector.cursor:running query [COMMIT]
DEBUG:snowflake.connector.cursor:is_file_transfer: True
DEBUG:snowflake.connector.connection:_cmd_query
DEBUG:snowflake.connector._query_context_cache:serialize_to_json() called
DEBUG:snowflake.connector.connection:sql=[COMMIT], sequence_id=[8], is_file_transfer=[False]
DEBUG:snowflake.connector.network:Session status for SessionPool 'xxx.snowflakecomputing.com', SessionPool 1/1 active sessions
DEBUG:snowflake.connector.network:remaining request timeout: None, retry cnt: 1
DEBUG:snowflake.connector.network:Request guid:
DEBUG:snowflake.connector.network:socket timeout: 60
DEBUG:snowflake.connector.vendored.urllib3.connectionpool:https://xxx.snowflakecomputing.com:443 "POST /queries/v1/query-request?requestId= HTTP/1.1" 200 1149
DEBUG:snowflake.connector.network:SUCCESS
DEBUG:snowflake.connector.network:Session status for SessionPool 'xxx.snowflakecomputing.com', SessionPool 0/1 active sessions
DEBUG:snowflake.connector.network:ret[code] = None, after post request
DEBUG:snowflake.connector.network:Query id:
DEBUG:snowflake.connector.cursor:sfqid:
INFO:snowflake.connector.cursor:query execution done
DEBUG:snowflake.connector.cursor:SUCCESS
DEBUG:snowflake.connector.cursor:PUT OR GET: False
DEBUG:snowflake.connector.cursor:Query result format: json
DEBUG:snowflake.connector.result_batch:parsing for result batch id: 1
INFO:snowflake.connector.cursor:Number of results in first chunk: 1
DEBUG:snowflake.connector.connection:cursor
DEBUG:snowflake.connector.cursor:executing SQL/command
... however, I don't see the similar logging of commit after the execution of copy command in pandas 1.4 which means the I think we're close to finding the root cause and the solution. |
alright, I find one workaround finally: after adding now I can copy the parquet into table and have those data. the questions now become:
I will keep looking |
as a workaround, you could try applying adding with get_sqlalchemy_engine().connect() as conn:
conn.execute(f"USE SCHEMA {SCHEMA}") # This is needed for pd_writer to work
pd.DataFrame({"a": [42]}).to_sql(
TABLE,
schema=SCHEMA,
con=conn,
index=False,
if_exists="append",
# see https://github.com/snowflakedb/snowflake-connector-python/issues/513
# Uncommenting this works as well
method=partial(pd_writer, quote_identifiers=False),
)
conn.execute('commit') I'm still comparing the behavior diff between pandas 1.3 & 1.4 |
Seeing this issue with a lack of committing changes (all other functionalities work as expected, for example auto creating a previously non-existing table) using Data is correctly Resolved in:
without hacking with engine.begin() as conn:
# call to `write_pandas`
...
# other code |
FWIW we abandoned using this API altogether and replaced it with a two step approach where we first upload the dataframe as Parquet to a temporary stage and then load that Parquet. Much more reliable than using this API that no one seems to care enough about to fix it |
Reproducer:
It works in Pandas 1.3 (specifically, before this PR: pandas-dev/pandas#43116)
Logs attached.
log-pandas-13.txt
log-pandas-14.txt
The text was updated successfully, but these errors were encountered: