Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum call stack size exceeded on a large string, increasing the process' stack size works tho. #101

Open
klnvnv opened this issue Nov 25, 2018 · 3 comments

Comments

@klnvnv
Copy link

klnvnv commented Nov 25, 2018

Hey man, great lib!

The stack blows up when trying to parse() some large strings, around a million characters or so.

I didn't look into how you're parsing the html string, so I don't know exactly what the reason is - too long of a string, to many html elements, or too many levels in the tree. The files I've got are between one and two MB, ~5-6k elements, and the elements should be less than ten levels nested.

I'm on a Mac, the default stack size is 8k, when I increase the stack for the node process to 64k it works ok.

@andrejewski
Copy link
Owner

A reproducible test case with data would be really helpful!

There are a few functions where recursion is used. From a quick reading, they all relate to nesting levels. I am surprised files this large are not >10 levels deep, this might indicate a parsing bug.

I'm glad you found a workaround. I think optimizing these recursion problems is worth some time.

It's also been a goal of mine for quite some time to make Himalaya stream-able such that the whole string doesn't need to be loaded into memory at once. That's a much larger change, so maybe fixing these recursions can buy me some more time.

@klnvnv
Copy link
Author

klnvnv commented Nov 25, 2018 via email

@andrejewski
Copy link
Owner

Unfortunately I don't think I can reproduce it on my own. I also played around with some nested divs and it handled that fine.

If I could guess, since you mention tables, there could be a parsing bug when dealing with your specific tables that is causing the parser to not unwind its stack correctly. What I recommend is segmenting those tables into smaller strings, parsing those, and picking out abnormalities.

Tables are pretty nasty so there's a better chance something obscure is happening there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants