Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advice on reducing memory usage of schedula for large Excel sheet from formulas #24

Open
scottgifford opened this issue Jan 23, 2023 · 0 comments

Comments

@scottgifford
Copy link

Hello,

After some early successes using formulas and schedula to calculate Excel LCAs, we have run into memory usage troubles when processing larger sheets. Some analysis shows that the memory is mostly in the formulas dsp property, which is a shcedula object, and I was hoping for some advice on what I could do to reduce memory usage there to support processing larger spreadsheets.

In a successful run on a host with 512GB of RAM, from a 5M-cell Excel file, I am able to load the 2.2M cells needed for my LCA calculation, with 165GB of total memory used. With logging I added, I can see that about 2GB is required to load the sheet into OpenPyXL, an additional 125GB to create the cells in formulas, an additional 10GB to .finish() the formulas object, and an additional 24GB to .compile() this into a function. For more details see excelsior-successful-run.txt.

In an unsuccessful run on a host with 512GB of RAM, from a 9.4M-cell Excel file, my program is killed by the Linux OOM killer after creating 8.8M cells and consuming 498GB of RAM. Logging indicates 5GB is used to load the Excel with OpenPyXL, then memory usage increases as formulas adds cells, until it is exhausted. For more details see excelsior-failed-run.txt.

My code to load the data and compile the function looks like this:

# out_ranges = ["'[/path/to/big-spreadsheet.xlsx]LOOKUPS'!E2"]
xl = formulas.ExcelModel()
xl.from_ranges(
    *out_ranges
)
xl.finish()

xl_func = xl.compile(
    inputs=formulas_input_mappings,
    outputs=formulas_output_mappings,
)

Any advice is appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant