-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New API for on-demand parsing #133
Comments
Here are some initial thoughts on how to proceed, hopefully leading to some discussion. Problem 1: How to represent the delayed parsing?So the idea is that upon encountering an I was worried that we'd need to not only know byte information, but subword information, but thankfully, blocks appear to be 32-bit-word-aligned and have 32-bit-word-multiple length! So, it seems we'd want some description of a computation to be performed, closed over the information it needs to do so. I don't think I'm thinking we might want to do something like an Problem 2: Altering pure code that expects those delayed resultsThere is a lot of pure code that expects to just read some fields from parsed entries. It seems to force us into one of two alterations: A. We make all such code be effectful, so that it can force the (sub-)entries as it tries to access them, B. We use different types for delayed entries and forced entries, and make these pure functions working over forced entries. The likely problem with option B is that entries are recursive, so we'd end up forcing all the sub-entries even if we only access some of them. |
I'd really like to avoid using One hypothetical approach (possibly naive): It seems like the initial parsing will need to read the spine of the AST at a minimum; if the AST representation object was extended such that each node was either the actual parsed/resolved object or an |
What @kquick is suggesting is pretty close to how our ELF parser (elf-edit) currently works. The initial parsing step only parses out the necessary metadata to understand the file and is mostly index structures. That returns what is effectively an index object that you can query for other information later, thus decoding on demand. You could imagine either
Leaving it up to callers is kind of nice, since it enables them to control what memory is retained (to the extent possible in Haskell...). There is a bit of a challenge with entities like types or global references that might need to be re-parsed often. I'm not quite sure what I think about that at the moment. A simple interface might look like:
We could probably eagerly parse all of the types and globals, while making individual functions our unit of laziness. That would save us from the worst of the performance problems. |
Currently, and entire bitcode module is parsed into AST nodes when it is loaded. This is a problem for the kinds of bitcode files we want to be able to handle, where we link together a lot of code (e.g. the c++ standard library) but only exercise a small amount of it.
We should refactor this library so that it instead only parses portions of the bitcode file as they are needed, as this will reduce the memory usage and improve the performance of this library a great deal for our primary use cases.
The text was updated successfully, but these errors were encountered: