-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-735220: pd_writer example from documentation not working #1420
Comments
If I use an ALL_UPPERCASE table name I get the following instead
|
If I pass |
I am also having a similar issue. I created this test to illustrate: import pandas as pd
from snowflake.connector.pandas_tools import pd_writer
from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine
account = "xxx"
user = "xxx"
password = "xxx"
def test_snowflake_pd_writer():
url = URL(
account=account,
user=user,
password=password)
engine = create_engine(url=url)
try:
database = "PD_WRITER_DB"
sql = f'CREATE DATABASE {database} '
with engine.connect() as connection:
connection.execute(sql)
url = URL(
account=account,
user=user,
password=password,
database=database,
schema="PUBLIC")
engine = create_engine(url=url)
# upper case columns
df = pd.DataFrame({
"AA": [1, 1, 1],
"BB": [2, 2, 2],
"CC": [3, 3, 3]})
# both of the following create the columns in uppercase and populate the new tables
df.to_sql(name="UPPER_NO_PD_WRITER", con=engine, index=False)
df.to_sql(name="UPPER_PD_WRITER", con=engine, method=pd_writer, index=False)
# mixed case columns
df = pd.DataFrame({
"Aa": [1, 1, 1],
"Bb": [2, 2, 2],
"Cc": [3, 3, 3]})
# both of the following create the columns in mixed case and populate the new tables
df.to_sql(name="MIXED_NO_PD_WRITER", con=engine, index=False)
df.to_sql(name="MIXED_PD_WRITER", con=engine, method=pd_writer, index=False)
# lower case columns
df = pd.DataFrame({
"aa": [1, 1, 1],
"bb": [2, 2, 2],
"cc": [3, 3, 3]})
# following df.to_sql creates table but columns in new table are upper case column names: AA, BB, CC
# is there a way to create the column names as lowercase?
# the columns are populated with the values from the dataframe
df.to_sql(name="LOWER_NO_PD_WRITER", con=engine, index=False)
# following df.to_sql creates table but columns in new table are upper case column names: AA, BB, CC
# fails with the error: invalid identifier '"aa"'
# no values are populated.
df.to_sql(name="LOWER_PD_WRITER", con=engine, method=pd_writer, index=False)
finally:
with engine.connect() as connection:
connection.execute(f'DROP DATABASE IF EXISTS {database} ') |
We will look at this issue this week. FYI @sfc-gh-stan |
Hi all, thanks for opening this issue. I could reproduce the error and will be working on a bug fix. Assigning this to myself. |
I looked into this further, this is because when sqlalchemy created the table, it uses the query
I will look into #1220 for now. FYI, the documented example for |
Any news on this? |
Thanks for looking into this problem, but your proposed workaround doesn't handle the case where the column name is lowercase and case-sensitive. Our application supports Snowflake connections, so we have no control over table names and have to handle our customers' tables as-is. In order to fully support Snowflake casing options we've had to build fairly elaborate workarounds for this bug and for snowflakedb/snowflake-sqlalchemy#388 |
Hello, is anyone from Snowflake looking into this? |
I see how setting |
Hi all, with #1594, you should be able to work around this behavior by enquoting the columns in your pandas DataFrames:
|
Is this what you are going to recommend in the documentation? |
Sure, I will include it in the documentation of |
Honestly, I think it's a terrible workaround. It shouldn't require manual quoting of column names to be able to upload a dataframe. Can we at least move the quoting logic inside the Snowflake adapter? So when at some point in the future this is properly fixed in Snowflake/the adapter, people don't need to remove this workaround from their code bases. |
@jonashaag , I think this is a documented behavior of Snowflake rather than a bug in the connector. The problem is when |
Ok, but since we have special code to support Pandas dataframes in the connector already, don't you think a good place for Snowflake specific column name handling would be inside the connector as well? Otherwise we have code in the connector to deal with Pandas dataframes that doesn't work unless you make specific changes to your dataframe. It's just not nice from a developer experience point of view. |
df.to_sql() doesn't appear to have a mechanism to change table create behavior, but internally it uses SQLAlchemy to create the table. So maybe this this could be further resolved in snowflake-sqlalchemy? There I think you could define specific SQLAlchemy dialect rules, such that the quoting is handled automatically at the SQLAlchemy level, rather than leaking this abstraction up to the Pandas call. I agree with @jonashaag this fix is not ideal. |
I see your concerns with this workaround, after digging in snowflake-sqlalchemy a bit more, I found that identifiers with a mix of lowercase and uppercase characters are actually automatically enquoted. However, identifiers with lowercase characters only are not, which causes the issue here. |
Thanks, and I can see you're basically at a dead end for options in this repo, so if you want to carry on this conversation somewhere else just lmk. Is there a PM responsible for this part of Snowflake's tooling that can make a call on what to do here? I completely understand not wanting to break anything as a maintainer, but generally it seems like an odd decision on Snowflake's part to officially not support official Snowflake features in their own tooling. Even if they decide not to do more to handle this problem, there should at least be documented disclaimers about what is and what is not supported, to save devs the trouble of rediscovering these rough edge cases. Also, I haven't explored in detail, but I think there are ways to add new IdentifierPreparer logic on an opt-in basis. It's possible to pass in arguments from engine = create_engine(
'snowflake://user:password@localhost/test',
strict_quotes=True
)
# or
engine = create_engine(
'snowflake://user:password@localhost/test?strict_quotes=true', |
I agree, handling of case sensitivity should definitely be improved in the Python connector/SQLAlchemy dialect. One of the first things one has to do when adding Snowflake support to a Python application or library is to add some hacks to make it work with column name casing. There is likely dozens of different hacks in the various codebases that have the privilege of having to deal with these problems, and each of those hacks is incorrect and incomplete in a different way I'm sure. |
Any news on this? Will you be leaving incorrect documentation forever? |
Hello? |
hi @jonashaag , apologize for the delayed response. we have updated the docstring to note the limitation as well as the workaround: snowflake-connector-python/src/snowflake/connector/pandas_tools.py Lines 463 to 481 in 00bcd55
thanks for sharing your thoughts and feedbacks on the issue, I agree in long term this should be a fix in the sqlalchemy library. |
Will you also fix the documentation here? https://docs.snowflake.com/developer-guide/python-connector/python-connector-api#pd_writer |
Are you guys fine with incorrect documentation? This issue is slowly becoming a meme |
hi @jonashaag , thanks for the reminder regarding the adjustments needed for the official document, we're working with the doc team to get this fixed: https://github.com/snowflakedb/snowflake-prod-docs/pull/4378 |
Thank you! That repo doesn't seem to be public but I'm looking forward to seeing the fix deployed! |
Seems to be fixed, thank you! |
Please answer these questions before submitting your issue. Thanks!
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0]
Linux-5.4.0-1094-aws-x86_64-with-glibc2.31
pip freeze
)?I also tried with snowflake-sqlalchemy 1.4.5
Run this example from the documentation
Expected the example to work, but got
/
The text was updated successfully, but these errors were encountered: