Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load interview stats directly into a dataframe? #21

Open
nonprofittechy opened this issue Apr 16, 2023 · 1 comment
Open

Load interview stats directly into a dataframe? #21

nonprofittechy opened this issue Apr 16, 2023 · 1 comment

Comments

@nonprofittechy
Copy link
Member

I'm wondering if we could save some overhead by loading from the database directly into a dataframe. As the number of rows grows, loading all records from the database is going to fail.

https://hackersandslackers.com/connecting-pandas-to-a-sql-database-with-sqlalchemy/

@BryceStevenWilley
Copy link
Contributor

Are you hitting the point of slow downs or memory pressure now? How many rows do you have?

#10 did a lot of performance improvements, and if I can recall correctly, for general operations, we could work with 100k rows pretty easily. The commit notes in that PR say that we got up to 400k rows before we had to put the excel generation into a background process.

Dataframes are loaded entirely into memory too, so if you're thinking to the point where simply loading all the rows will fail, the dataframe would start failing pretty quickly as well. Happy to do more performance work, but I'd rather not make a lot superfluous changes that don't really help, and we'd need specific things to try to improve (like memory pressure or speed), at specific data sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants