-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data views / substreams support #44
Comments
Construct had a substream class that was used by Prefixed field. I am not familiar enough with Kaitai to provide an implementation, just posting it for reference. |
@arekbulski Yes, that's exactly what we were looking to here :) |
On some runtimes (those non-statically typed) the feature might actually be counter productive. Especially if field inside has only few bytes. Remember that, on Python at least, you are replacing C-impl bytes copy and C-impl BytesIO with something where each read is wrapped in pure-python red tape. And you do that to avoid C-imple copying of few bytes. If the subcon is mega-size, well, that changes things. You might consider adding a property to parsed field, so at compile time it emits one code or the other. User would need to opt-in for this feature. |
A valid concern, but I'd suggest that we benchmark these things first and then see if we need to do any workarounds. May be in languages like that implementation of Also, I'd suggest that we'd start investigating if there are any kind of bound substream class implementations available in other languages. |
Agreed, we need a benchmark. |
Hi, Maybe this is a separate issue but I was wondering if it is possible to extend this to custom processors? The input to the processor would be the view as described in the issue. The output a higher level object that exposes the processed content as a new kaitai stream/view with the addition of some state management methods like close() for example when the root object is destroyed. The resulting view can then again be used with the data view as described in this issue for the parsing of the processed content. In my case I have a filesystem with potentially large encrypted partitions. What I would like to do is decrypt the partition to disk en provide a mmap'ed file to kaitai for further parsing. But I can imagine cases where the view returned by the processor would do the processing on demand (like simple xor that is already available). I think my case can be solved by using a custom type that will do the decryption and manually call kaitai to parse its content afterwards. State management of the mmaped file would have to be done out of band. |
@jvisser Sure, it makes sense. We just need to think out the API carefully. |
…oncrete problem to test my pull request.
…_if_opaque_type_pull kaitai-io/kaitai_struct_compiler kaitai-io#44: Added a test for the concrete p…
* Preserve http(s) protocol when navigating to *.kaitai.io * Fix explicit https://kaitai.io/ link * Fix lastl *.kaitai.io links with explicit protocol
Hey folks, please take a look, I took a first pass at implementing this for Ruby only:
Primarily, changes in generated code will look like this: def _read
@len1 = @_io.read_u4le
- @_raw_block1 = @_io.read_bytes(len1)
- _io__raw_block1 = Kaitai::Struct::Stream.new(@_raw_block1)
- @block1 = Block.new(_io__raw_block1, self, @_root)
+ _io_block1 = @_io.substream(len1)
+ @block1 = Block.new(_io_block1, self, @_root) So, 2 lines instead of 3, no pre-reading of bytes in the memory, cleaner API overall. I will launch the official tests now, but judging from the tests locally, tests still work as they were before. Please tell me if you think it's a good idea and/or if we can make this work for other languages. |
Added similar implementation to Java — see kaitai-io/kaitai_struct_java_runtime@ee61d73 for runtime change and kaitai-io/kaitai_struct_compiler@b529bc6b for compiler. The main difference is Java has two KaitaiStream implementations, and the good news is that one of them (ByteBuffer-based one) already has all the slicing/limits machinery in place, ready for substreams implementation. Generated code-wise, it's a very similar change: private void _read() {
this.len1 = this._io.readU4le();
- this._raw_block1 = this._io.readBytes(len1());
- KaitaiStream _io__raw_block1 = new ByteBufferKaitaiStream(_raw_block1);
- this.block1 = new Block(_io__raw_block1, this, _root);
+ KaitaiStream _io_block1 = this._io.substream(len1())
+ this.block1 = new Block(_io_block1, this, _root); The big caveat is that public byte[] _raw_block1() { return _raw_block1; } with something like public byte[] _raw_block1() {
KaitaiStream io = block1._io();
long oldPos = io.pos();
io.seek(0);
byte[] allBytes = block1.readBytesFull();
io.seek(oldPos);
return allBytes;
} |
Please, consider also merging kaitai-io/kaitai_struct_java_runtime#28 and related in order to give tools a chance to correctly visualize substreams. |
@GreyCat Note that the adoption of "ExprIoPos": {
- "status": "passed",
- "elapsed": 0.000336907,
+ "status": "failed",
+ "elapsed": 0.000463687,
+ "failure": {
+ "file_name": "./spec/ruby/expr_io_pos_spec.rb",
+ "line_num": 4,
+ "message": "undefined method `size' for #<Kaitai::Struct::SubIO:0x0000000002666050 @parent_io=#<File:src/expr_io_pos.bin>, @parent_start=0, @parent_len=16, @parent_end=16, @pos=10, @closed=false>",
+ "trace": "/home/travis/build/kaitai-io/ci_targets/runtime/ruby/lib/kaitai/struct/struct.rb:145:in `size'\n
+ /home/travis/build/kaitai-io/ci_targets/compiled/ruby/expr_io_pos.rb:30:in `_read'\n
+ /home/travis/build/kaitai-io/ci_targets/compiled/ruby/expr_io_pos.rb:25:in `initialize'\n
+ /home/travis/build/kaitai-io/ci_targets/compiled/ruby/expr_io_pos.rb:17:in `new'\n
+ /home/travis/build/kaitai-io/ci_targets/compiled/ruby/expr_io_pos.rb:17:in `_read'\n
+ /home/travis/build/kaitai-io/ci_targets/compiled/ruby/expr_io_pos.rb:12:in `initialize'\n
+ /home/travis/build/kaitai-io/ci_targets/runtime/ruby/lib/kaitai/struct/struct.rb:26:in `new'\n
+ /home/travis/build/kaitai-io/ci_targets/runtime/ruby/lib/kaitai/struct/struct.rb:26:in `from_file'\n
+ /home/travis/build/kaitai-io/ci_targets/tests/spec/ruby/expr_io_pos_spec.rb:6:in `block (2 levels) in <top (required)>'\n
+ /home/travis/.rvm/gems/ruby-3.0.4/gems/rspec-core-3.11.0/lib/rspec/..."
+ },
"is_kst": true
}, |
Great catch, thanks! It's very easy to fix, actually, let me do that. |
Zero-copy substreams are great, but we should consider that they would typically require a seekable stream where So at least there should be a command-line option to bring the traditional It's basically the same issue as in kaitai-io/kaitai_struct_cpp_stl_runtime#46 (comment). |
Makes perfect sense. We'll be supporting both anyway, e.g. for sake of supporting existing interface for custom processing which is all working exclusively on byte arrays. Let's start with a command line switch to toggle both ways. |
but as soon as you have read bytes, then you can just create a seekable stream that you can pass around, right? |
The `_raw_*` fields are no longer available in default compiler mode using zero-copy substreams (see kaitai-io/kaitai_struct#44), so this test was actually broken for a while.
Checks whether creating a substream that exceeds the remaining stream size throws an EOF error. This turned out not to be the case in Ruby with zero-copy substreams (kaitai-io/kaitai_struct#44), so it's clearly beneficial to have this covered.
Currently, all languages use something similar to this code, when it's time to do a substream and parse objects from substream:
This is inefficient, especially for larger data streams - it needs to load everything into memory and then parse it from there. A more efficient approach would use using some sort of substreams in manner of data views, i.e. something like:
or, for instances that have known
pos
field, something like that:The devil, of course, is in the details:
_raw_*
byte arrays and parsing using substreams.repeat
constructs exist on this field?process
- these actually require reading and re-processing the whole byte array in memory. Or should we re-implement them as stream-in => stream-out converters as well?The text was updated successfully, but these errors were encountered: