Load interview stats directly into a dataframe? #21

nonprofittechy · 2023-04-16T19:17:39Z

I'm wondering if we could save some overhead by loading from the database directly into a dataframe. As the number of rows grows, loading all records from the database is going to fail.

https://hackersandslackers.com/connecting-pandas-to-a-sql-database-with-sqlalchemy/

BryceStevenWilley · 2023-04-17T04:35:58Z

Are you hitting the point of slow downs or memory pressure now? How many rows do you have?

#10 did a lot of performance improvements, and if I can recall correctly, for general operations, we could work with 100k rows pretty easily. The commit notes in that PR say that we got up to 400k rows before we had to put the excel generation into a background process.

Dataframes are loaded entirely into memory too, so if you're thinking to the point where simply loading all the rows will fail, the dataframe would start failing pretty quickly as well. Happy to do more performance work, but I'd rather not make a lot superfluous changes that don't really help, and we'd need specific things to try to improve (like memory pressure or speed), at specific data sizes.

nonprofittechy added the Performance label Apr 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load interview stats directly into a dataframe? #21

Load interview stats directly into a dataframe? #21

nonprofittechy commented Apr 16, 2023

BryceStevenWilley commented Apr 17, 2023

Load interview stats directly into a dataframe? #21

Load interview stats directly into a dataframe? #21

Comments

nonprofittechy commented Apr 16, 2023

BryceStevenWilley commented Apr 17, 2023