-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximum call stack size exceeded on a large string, increasing the process' stack size works tho. #101
Comments
A reproducible test case with data would be really helpful! There are a few functions where recursion is used. From a quick reading, they all relate to nesting levels. I am surprised files this large are not >10 levels deep, this might indicate a parsing bug. I'm glad you found a workaround. I think optimizing these recursion problems is worth some time. It's also been a goal of mine for quite some time to make Himalaya stream-able such that the whole string doesn't need to be loaded into memory at once. That's a much larger change, so maybe fixing these recursions can buy me some more time. |
Sorry man, I wish I could share it, but it’s some private user data.
I’ve been trying for like an hour to make some files that reproduce it. A
million bare divs, a dozen levels deep work. Some random html and text
repeating 1m times doesn’t make it hiccup. All I manage to do is run out
of heap. I could parse a 100 meg string without a hitch. Good on you!!!
In the original files there’s a lot of text and a lot of links and tags do
have a lot of attributes. The deepest level it goes is up to 13 and there
are a lot of tables with nested tags and text.
I can’t easily recreate the structure in an automatic way, sorry. Maybe you
could give it a try?
…On Sun, 25 Nov 2018 at 20:58, Chris Andrejewski ***@***.***> wrote:
A reproducible test case with data would be really helpful!
There are a few functions where recursion is used. From a quick reading,
they all relate to nesting levels. I am surprised files this large are not
>10 levels deep, this might indicate a parsing bug.
I'm glad you found a workaround. I think optimizing these recursion
problems is worth some time.
It's also been a goal of mine for quite some time to make Himalaya
stream-able such that the whole string doesn't need to be loaded into
memory at once. That's a much larger change, so maybe fixing these
recursions can buy me some more time.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#101 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACMufnKONRExEglcu7T7CHP1ttNnpj7sks5uyuhbgaJpZM4Yxzoc>
.
|
Unfortunately I don't think I can reproduce it on my own. I also played around with some nested divs and it handled that fine. If I could guess, since you mention tables, there could be a parsing bug when dealing with your specific tables that is causing the parser to not unwind its stack correctly. What I recommend is segmenting those tables into smaller strings, parsing those, and picking out abnormalities. Tables are pretty nasty so there's a better chance something obscure is happening there. |
Hey man, great lib!
The stack blows up when trying to parse() some large strings, around a million characters or so.
I didn't look into how you're parsing the html string, so I don't know exactly what the reason is - too long of a string, to many html elements, or too many levels in the tree. The files I've got are between one and two MB, ~5-6k elements, and the elements should be less than ten levels nested.
I'm on a Mac, the default stack size is 8k, when I increase the stack for the node process to 64k it works ok.
The text was updated successfully, but these errors were encountered: