-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support "clamping" of stream start in KaitaiStream #579
Comments
Um, can you elaborate how it's different from current substreams implementation, and proposed newer, more efficient substreams implementation (#44)? |
It's likely I'm missing something simple, but with the current KaitaiStream implementation (at least in Python), if I wanted to begin parsing an arbitrary structure mid-stream (as in the original example) I believe I'd have to read in the entire stream's data from the starting point on, then pass that. e.g.:
Without "clamping" or creating a new stream with the start of the structure at position 0, the Foo structure has an issue with mid-stream parsing since the body is found with an absolute seek (as opposed to a relative seek) Here's the python generated code for Foo.ksy in KaitaiStruct 0.8:
And here's some relevant KaitaiStream code:
For my case at least (and I haven't run the unit tests) using my ClampedKaitaiStream (or alternately adding that code to KaitaiStream) works for me since the ClampedKaitaiStream has the ability to know the original stream position, and then modify seek(), pos() and size() returns to use this. This allows me to parse the Foo structure, mid stream, without having to read in the entire rest of the stream's data and have the body's offset point to the right place. In the example below, I'm imagining that the KaitaiStream class had code similar to ClampedKaitaiStream.
Again, it's likely I'm missing something simple in the current implementation, but this is the difference that I see. With my proposed solution I could parse this structure which has an absolute offset from the middle of a stream without reading in the rest of the stream as data to make a new substream. I haven't looked at the larger problem of improving substream efficiency (as well as handling all the related fuctionality and edge-cases) as outlined in #44. |
It looks like you do in your sample code exactly what ksc will generate when given something like: instances:
my_foo:
pos: 20
size-eos: true
type: foo i.e. in Python, it will generate this: self._io.seek(20) # seek
self._raw__m_my_foo = self._io.read_bytes_full() # read bytes till the end of stream
_io__raw__m_my_foo = KaitaiStream(BytesIO(self._raw__m_my_foo)) # create new KaitaiStream
self._m_my_foo = self._root.Foo(_io__raw__m_my_foo, self, self._root) # pass it to Foo Basically, what you want (i.e. substreams, "clamped" to position and size of part of original stream) is already supported in ksy — however, in not a very efficient manner. #44 addresses that and proposes a new, cleaner interface. So far it looks very close to what you propose — i.e. without several operations (seek + read bytes + create new BytesIO out of bytes + create new KaitaiStream wrapping that BytesIO), it should be exactly one call, something like io = self._io.substream(20, -1) # or just (20), if we're talking about substream to end of current steram
self._m_my_foo = self._root.Foo(io, self, self._root) |
I'll keep my eye out for #44, thanks for the feedback. The actual use case that prompted this was Parsing MachO executables ( https://github.com/kaitai-io/kaitai_struct_formats/blob/master/executable/mach_o.ksy ) out of Fat files, and I solved it in a similar way to you, created a new ksy file for the FAT format. Going back to the initial use case proposed here, it sounds like if I did want to handle this case efficiently with the new substream changes, something like the following would work.
Feel free to close as duplicate of #44 :) |
Thanks for confirming! Let's continue the discussion in #44 — I see that you already have implementation for Python, please consider contributing it as part of that effort? |
Currently it's challenging to parse non-trivial structures from a stream if the position is not at the beginning. As an example:
Sample ksy
Sample Python Code demonstrating issue
To support this I'd request that the ability to optionally "clamp" the stream in the constructor for KaitaiStream so that the original position is saved, and a few operations like seek() and size() can be modified. Additionally it'd by handy to have KaitaiStruct.from_io() also support this parameter.
my kludge (for python) is currently:
The text was updated successfully, but these errors were encountered: