SNOW-735220: pd_writer example from documentation not working #1420

jonashaag · 2023-01-31T12:28:47Z

Please answer these questions before submitting your issue. Thanks!

What version of Python are you using?

Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0]

What operating system and processor architecture are you using?

Linux-5.4.0-1094-aws-x86_64-with-glibc2.31

What are the component versions in the environment (pip freeze)?

mamba list | grep -e snow -e pandas
pandas                    1.5.2           py310h769672d_0    https://conda.anaconda.org/conda-forge
snowflake-connector-python 2.8.3           py310hea2c3ac_0    https://conda.anaconda.org/conda-forge
snowflake-sqlalchemy      1.3.4              pyhd8ed1ab_0    https://conda.anaconda.org/conda-forge

I also tried with snowflake-sqlalchemy 1.4.5

What did you do?

Run this example from the documentation

import pandas
from snowflake.connector.pandas_tools import pd_writer

# Create a DataFrame containing data about customers
df = pandas.DataFrame([('Mark', 10), ('Luke', 20)], columns=['name', 'balance'])

# Specify that the to_sql method should use the pd_writer function
# to write the data from the DataFrame to the table named "customers"
# in the Snowflake database.
df.to_sql('customers', engine, index=False, method=pd_writer)

What did you expect to see?

Expected the example to work, but got

snowflake.connector.errors.ProgrammingError: 000904 (42000): 01aa0467-3201-91be-0000-1e695e9861c6: SQL compilation error: error line 1 at
position 92
invalid identifier '"name"'

Can you set logging to DEBUG and collect the logs?

/

The text was updated successfully, but these errors were encountered:

jonashaag · 2023-01-31T12:35:39Z

If I use an ALL_UPPERCASE table name I get the following instead

sqlalchemy.exc.InvalidRequestError: Could not reflect: requested table(s) not available in Engine(...) schema '...': (MY_TABLE_NAME)

jonashaag · 2023-01-31T12:37:42Z

If I pass quote_identifiers=False, I hit #1220.

michaelkwagner · 2023-02-14T15:33:32Z

I am also having a similar issue. I created this test to illustrate:

import pandas as pd
from snowflake.connector.pandas_tools import pd_writer
from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine

account = "xxx"
user = "xxx"
password = "xxx"

def test_snowflake_pd_writer():
    url = URL(
        account=account,
        user=user,
        password=password)
    engine = create_engine(url=url)

    try:
        database = "PD_WRITER_DB"
        sql = f'CREATE DATABASE {database} '
        with engine.connect() as connection:
            connection.execute(sql)

        url = URL(
            account=account,
            user=user,
            password=password,
            database=database,
            schema="PUBLIC")
        engine = create_engine(url=url)

        # upper case columns
        df = pd.DataFrame({
            "AA": [1, 1, 1],
            "BB": [2, 2, 2],
            "CC": [3, 3, 3]})

        # both of the following create the columns in uppercase and populate the new tables
        df.to_sql(name="UPPER_NO_PD_WRITER", con=engine, index=False)
        df.to_sql(name="UPPER_PD_WRITER", con=engine, method=pd_writer, index=False)

        # mixed case columns
        df = pd.DataFrame({
            "Aa": [1, 1, 1],
            "Bb": [2, 2, 2],
            "Cc": [3, 3, 3]})

        # both of the following create the columns in mixed case and populate the new tables            
        df.to_sql(name="MIXED_NO_PD_WRITER", con=engine, index=False)
        df.to_sql(name="MIXED_PD_WRITER", con=engine, method=pd_writer, index=False)

        # lower case columns
        df = pd.DataFrame({
            "aa": [1, 1, 1],
            "bb": [2, 2, 2],
            "cc": [3, 3, 3]})
            
        # following df.to_sql creates table but columns in new table are upper case column names: AA, BB, CC
        # is there a way to create the column names as lowercase? 
        # the columns are populated with the values from the dataframe
        df.to_sql(name="LOWER_NO_PD_WRITER", con=engine, index=False)
        
        # following df.to_sql creates table but columns in new table are upper case column names: AA, BB, CC
        # fails with the error: invalid identifier '"aa"'
        # no values are populated.
        df.to_sql(name="LOWER_PD_WRITER", con=engine, method=pd_writer, index=False)
    finally:
        with engine.connect() as connection:
            connection.execute(f'DROP DATABASE IF EXISTS {database} ')

sfc-gh-achandrasekaran · 2023-04-11T16:47:48Z

We will look at this issue this week. FYI @sfc-gh-stan

sfc-gh-stan · 2023-04-15T00:38:31Z

Hi all, thanks for opening this issue. I could reproduce the error and will be working on a bug fix. Assigning this to myself.

sfc-gh-stan · 2023-04-27T19:00:55Z

I looked into this further, this is because when sqlalchemy created the table, it uses the query 'CREATE TABLE customers ( name TEXT, balance BIGINT )' but pd_writer defaults to quoting the identifier names and uppercasing the table name, so Snowflake finds the correct table (customers and "CUSTOMERS" matches) but couldn't find the columns (name and "name" are different). Specifying quote_identifiers=False should work around this, I didn't hit #1220 by running df.to_sql('new_customers', engine, index=False, method=lambda a,b,c,d:pd_writer(a,b,c,d, quote_identifiers=False)) with

pandas==1.5.2
...
snowflake-connector-python==2.8.3
snowflake-sqlalchemy==1.4.3
SQLAlchemy==1.4.47

I will look into #1220 for now.

FYI, the documented example for pd_writer actually works because the column names are specified in uppercase, the example used here is from write_pandas.

jonashaag · 2023-05-15T12:10:49Z

Any news on this?

willsthompson · 2023-05-25T18:44:51Z

Thanks for looking into this problem, but your proposed workaround doesn't handle the case where the column name is lowercase and case-sensitive. Our application supports Snowflake connections, so we have no control over table names and have to handle our customers' tables as-is. In order to fully support Snowflake casing options we've had to build fairly elaborate workarounds for this bug and for snowflakedb/snowflake-sqlalchemy#388

jonashaag · 2023-05-31T20:56:42Z

Hello, is anyone from Snowflake looking into this?

sfc-gh-stan · 2023-05-31T21:00:36Z

I see how setting quote_identifiers to False does not work for column names that are case-sensitive. I will take a look again soon and see what the best approach is here.

sfc-gh-stan · 2023-06-07T20:57:37Z

Hi all, with #1594, you should be able to work around this behavior by enquoting the columns in your pandas DataFrames:

df = pandas.DataFrame([('Mark', 10), ('Luke', 20)], columns=['"name"', '"balance"'])
df.to_sql('customers', engine, index=False, method=pd_writer)

jonashaag · 2023-06-07T21:06:33Z

Is this what you are going to recommend in the documentation?

sfc-gh-stan · 2023-06-07T21:24:40Z

Is this what you are going to recommend in the documentation?

Sure, I will include it in the documentation of pd_writer.

jonashaag · 2023-06-07T21:47:37Z

Honestly, I think it's a terrible workaround. It shouldn't require manual quoting of column names to be able to upload a dataframe.

Can we at least move the quoting logic inside the Snowflake adapter? So when at some point in the future this is properly fixed in Snowflake/the adapter, people don't need to remove this workaround from their code bases.

sfc-gh-stan · 2023-06-07T22:13:06Z

@jonashaag , I think this is a documented behavior of Snowflake rather than a bug in the connector. The problem is when pandas generates the query to create the table, it uses the dataframe's column names directly, but Snowflake needs case-sensitive identifiers to be double-quoted.

jonashaag · 2023-06-08T06:59:05Z

Ok, but since we have special code to support Pandas dataframes in the connector already, don't you think a good place for Snowflake specific column name handling would be inside the connector as well? Otherwise we have code in the connector to deal with Pandas dataframes that doesn't work unless you make specific changes to your dataframe. It's just not nice from a developer experience point of view.

willsthompson · 2023-06-08T17:07:40Z

df.to_sql() doesn't appear to have a mechanism to change table create behavior, but internally it uses SQLAlchemy to create the table. So maybe this this could be further resolved in snowflake-sqlalchemy? There I think you could define specific SQLAlchemy dialect rules, such that the quoting is handled automatically at the SQLAlchemy level, rather than leaking this abstraction up to the Pandas call. I agree with @jonashaag this fix is not ideal.

sfc-gh-stan · 2023-06-14T20:08:20Z

I see your concerns with this workaround, after digging in snowflake-sqlalchemy a bit more, I found that identifiers with a mix of lowercase and uppercase characters are actually automatically enquoted. However, identifiers with lowercase characters only are not, which causes the issue here.
I agree this should ideally be fixed in snowflake-sqlalchemy with a custom IdentifierPreparer or so, but that would be a breaking change with potentially huge blast radius.

willsthompson · 2023-06-22T17:19:34Z

Thanks, and I can see you're basically at a dead end for options in this repo, so if you want to carry on this conversation somewhere else just lmk.

Is there a PM responsible for this part of Snowflake's tooling that can make a call on what to do here? I completely understand not wanting to break anything as a maintainer, but generally it seems like an odd decision on Snowflake's part to officially not support official Snowflake features in their own tooling. Even if they decide not to do more to handle this problem, there should at least be documented disclaimers about what is and what is not supported, to save devs the trouble of rediscovering these rough edge cases.

Also, I haven't explored in detail, but I think there are ways to add new IdentifierPreparer logic on an opt-in basis. It's possible to pass in arguments from create_engine() that will be received by the dialect class. So IIUC you could still maintain current behavior as default and allow users to opt into alternative logic via something like

engine = create_engine(
    'snowflake://user:password@localhost/test', 
    strict_quotes=True
)
# or
engine = create_engine(
    'snowflake://user:password@localhost/test?strict_quotes=true',

jonashaag · 2023-06-22T18:08:20Z

I agree, handling of case sensitivity should definitely be improved in the Python connector/SQLAlchemy dialect. One of the first things one has to do when adding Snowflake support to a Python application or library is to add some hacks to make it work with column name casing. There is likely dozens of different hacks in the various codebases that have the privilege of having to deal with these problems, and each of those hacks is incorrect and incomplete in a different way I'm sure.

jonashaag · 2023-07-10T15:53:47Z

Any news on this? Will you be leaving incorrect documentation forever?

jonashaag · 2023-07-25T12:55:09Z

Hello?

sfc-gh-aling · 2023-07-25T18:00:11Z

hi @jonashaag , apologize for the delayed response.

we have updated the docstring to note the limitation as well as the workaround:

snowflake-connector-python/src/snowflake/connector/pandas_tools.py

Lines 463 to 481 in 00bcd55

    
                   Notes: 
        
                       Please note that when column names in the pandas DataFrame are consist of strictly lower case letters, column names need to 
        
                       be enquoted, otherwise `ProgrammingError` will be raised. 
        
                       This is because `snowflake-sqlalchemy` does not enquote lower case column names when creating the table, but `pd_writer` enquotes the columns by default. 
        
                       the copy into command looks for enquoted column names. 
        
                       Future improvements will be made in the snowflake-sqlalchemy library. 
        
                   Example usage: 
        
                       import pandas as pd 
        
                       from snowflake.connector.pandas_tools import pd_writer 
        
                       sf_connector_version_df = pd.DataFrame([('snowflake-connector-python', '1.0')], columns=['NAME', 'NEWEST_VERSION']) 
        
                       sf_connector_version_df.to_sql('driver_versions', engine, index=False, method=pd_writer) 
        
                       # when the column names are consist of only lower case letters, enquote the column names 
        
                       sf_connector_version_df = pd.DataFrame([('snowflake-connector-python', '1.0')], columns=['"name"', '"newest_version"']) 
        
                       sf_connector_version_df.to_sql('driver_versions', engine, index=False, method=pd_writer)

thanks for sharing your thoughts and feedbacks on the issue, I agree in long term this should be a fix in the sqlalchemy library.

jonashaag · 2023-07-25T19:07:19Z

Will you also fix the documentation here? https://docs.snowflake.com/developer-guide/python-connector/python-connector-api#pd_writer

jonashaag · 2023-08-24T18:14:50Z

Are you guys fine with incorrect documentation?

This issue is slowly becoming a meme

sfc-gh-aling · 2023-08-25T18:47:51Z

hi @jonashaag , thanks for the reminder regarding the adjustments needed for the official document, we're working with the doc team to get this fixed: https://github.com/snowflakedb/snowflake-prod-docs/pull/4378

jonashaag · 2023-08-25T20:09:12Z

Thank you! That repo doesn't seem to be public but I'm looking forward to seeing the fix deployed!

jonashaag · 2023-08-28T06:19:57Z

Seems to be fixed, thank you!

jonashaag added bug needs triage labels Jan 31, 2023

github-actions bot changed the title ~~pd_writer example from documentation not working~~ SNOW-735220: pd_writer example from documentation not working Jan 31, 2023

sfc-gh-yixie added triaged and removed needs triage labels Mar 10, 2023

sfc-gh-stan self-assigned this Apr 15, 2023

sfc-gh-stan mentioned this issue Apr 26, 2023

SNOW-735220 Do not capitalize table name in pd_writer #1533

Closed

7 tasks

sfc-gh-stan mentioned this issue Jun 7, 2023

SNOW-746341 Add file format specification to create temp stage SQL in pd_writer #1594

Merged

7 tasks

jonashaag mentioned this issue Jun 7, 2023

SNOW-640837: pd_writer regression with Pandas 1.3 -> 1.4 #1220

Open

jonashaag closed this as completed Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-735220: pd_writer example from documentation not working #1420

SNOW-735220: pd_writer example from documentation not working #1420

jonashaag commented Jan 31, 2023

jonashaag commented Jan 31, 2023

jonashaag commented Jan 31, 2023

michaelkwagner commented Feb 14, 2023 •

edited

Loading

sfc-gh-achandrasekaran commented Apr 11, 2023

sfc-gh-stan commented Apr 15, 2023

sfc-gh-stan commented Apr 27, 2023

jonashaag commented May 15, 2023

willsthompson commented May 25, 2023

jonashaag commented May 31, 2023

sfc-gh-stan commented May 31, 2023

sfc-gh-stan commented Jun 7, 2023 •

edited

Loading

jonashaag commented Jun 7, 2023

sfc-gh-stan commented Jun 7, 2023

jonashaag commented Jun 7, 2023

sfc-gh-stan commented Jun 7, 2023

jonashaag commented Jun 8, 2023 •

edited

Loading

willsthompson commented Jun 8, 2023 •

edited

Loading

sfc-gh-stan commented Jun 14, 2023 •

edited

Loading

willsthompson commented Jun 22, 2023

jonashaag commented Jun 22, 2023

jonashaag commented Jul 10, 2023 •

edited

Loading

jonashaag commented Jul 25, 2023

sfc-gh-aling commented Jul 25, 2023

jonashaag commented Jul 25, 2023

jonashaag commented Aug 24, 2023

sfc-gh-aling commented Aug 25, 2023

jonashaag commented Aug 25, 2023

jonashaag commented Aug 28, 2023

SNOW-735220: pd_writer example from documentation not working #1420

SNOW-735220: pd_writer example from documentation not working #1420

Comments

jonashaag commented Jan 31, 2023

jonashaag commented Jan 31, 2023

jonashaag commented Jan 31, 2023

michaelkwagner commented Feb 14, 2023 • edited Loading

sfc-gh-achandrasekaran commented Apr 11, 2023

sfc-gh-stan commented Apr 15, 2023

sfc-gh-stan commented Apr 27, 2023

jonashaag commented May 15, 2023

willsthompson commented May 25, 2023

jonashaag commented May 31, 2023

sfc-gh-stan commented May 31, 2023

sfc-gh-stan commented Jun 7, 2023 • edited Loading

jonashaag commented Jun 7, 2023

sfc-gh-stan commented Jun 7, 2023

jonashaag commented Jun 7, 2023

sfc-gh-stan commented Jun 7, 2023

jonashaag commented Jun 8, 2023 • edited Loading

willsthompson commented Jun 8, 2023 • edited Loading

sfc-gh-stan commented Jun 14, 2023 • edited Loading

willsthompson commented Jun 22, 2023

jonashaag commented Jun 22, 2023

jonashaag commented Jul 10, 2023 • edited Loading

jonashaag commented Jul 25, 2023

sfc-gh-aling commented Jul 25, 2023

jonashaag commented Jul 25, 2023

jonashaag commented Aug 24, 2023

sfc-gh-aling commented Aug 25, 2023

jonashaag commented Aug 25, 2023

jonashaag commented Aug 28, 2023

michaelkwagner commented Feb 14, 2023 •

edited

Loading

sfc-gh-stan commented Jun 7, 2023 •

edited

Loading

jonashaag commented Jun 8, 2023 •

edited

Loading

willsthompson commented Jun 8, 2023 •

edited

Loading

sfc-gh-stan commented Jun 14, 2023 •

edited

Loading

jonashaag commented Jul 10, 2023 •

edited

Loading