Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Make test_sql.py parallelizable #60595

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

UmbertoFasci
Copy link
Contributor

Description:

Generated individual uuids for individual hardcoded table name instances throughout the test cases as per the original feature description. Handling the test cases for connections with a shared database state as those which leverage the iris, or type table defined and created in fixtures requires more refactoring if needed to make parallelizable.

On that same note, the parallel safety of these particular connections should be further tested to see if they do require unique table names.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

@UmbertoFasci UmbertoFasci requested a review from a user December 20, 2024 19:55
pandas/tests/io/test_sql.py Show resolved Hide resolved
pandas/tests/io/test_sql.py Outdated Show resolved Hide resolved
pandas/tests/io/test_sql.py Outdated Show resolved Hide resolved
@mroeschke mroeschke removed the request for review from a user December 20, 2024 22:03
@UmbertoFasci
Copy link
Contributor Author

UmbertoFasci commented Dec 21, 2024

@WillAyd There seems to be an issue due to the way the drop_table function is currently set up when going through the future infer strings (without pyarrow) test. Simply put, when running the tests in parallel, the cleanup operations can fail because one test process might try to check a table's structure right after another process has already deleted that table.

I am thinking this can be handled by simply catching the error that is raised if the table has already been dropped.

@mroeschke
Copy link
Member

With this change, each test should truly create and drop a unique table. There may be some routines that need to adapt to this (either drop being called more consistently or tables names being pass through from the create to the drop process)

@ghost
Copy link

ghost commented Dec 21, 2024 via email

@rhshadrach rhshadrach added Testing pandas testing functions or related to the test suite IO SQL to_sql, read_sql, read_sql_query labels Dec 27, 2024
@WillAyd
Copy link
Member

WillAyd commented Dec 27, 2024

Yea sorry @UmbertoFasci I suppose this is a little more complicated than I originally envisioned. You might be able to have the individual fixtures generate both a connection and a table name instead of just a connection. That way the filter can just drop the table name at the end of the fixture, if it exists.

There might be some nuance with views but I'd start with tables first and see how far you get

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST: Make test_sql.py parallelizable
4 participants