Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can serialization/microsoft_cfb.ksy be made more browsable? #195

Closed
chelh opened this issue Jul 2, 2017 · 3 comments
Closed

Can serialization/microsoft_cfb.ksy be made more browsable? #195

chelh opened this issue Jul 2, 2017 · 3 comments

Comments

@chelh
Copy link

chelh commented Jul 2, 2017

Hi there, I'm evaluating using Kaitai Struct to build parsers for a couple of file formats related to Microsoft Office VBA. It's great that there already exists a serialization/microsoft_cfb.ksy, since the VBA binary files use that as a container format. But when I apply that .ksy to a file and browse, I'm not able to browse to specific streams and storages. Instead I see most file contents under fat/entries as an array of s4. I imagine it was done this way due to the actual streams potentially being fragmented within the structure.

Can Kaitai Struct actually deal with this sort of fragmentation, and the .ksy is just in need of some TLC? Or can it not, and thus it's not appropriate for this file format?

Version of serialization/microsoft_cfb.ksy I'm using: microsoft_cfb.ksy.zip

Test file: Designations.bin.zip

What I expect to see (tree of storages and streams):
image

What I actually see (flat list of s4):
image

@GreyCat
Copy link
Member

GreyCat commented Jul 2, 2017

microsoft_cfb.ksy is a work-in-progress specification, so not everything is implemented.

As far as I understand, to get to the individual streams & storages, we need to travel through directory (/dir), and discover a tree that describes them. Then that tree component would give us a pointers to the FAT sectors (actually, minifat in case of this file — which is also important, as we need to make a choice, which table to use). After that, we need to join these FAT sectors to assemble a stream.

Everything except for reassembling of byte arrays into a stream is available in KS right now. Reassembling is planned (as it is obviously a pretty relevant feature used by a lot of formats), but no exact time estimates available yet when it would be done. Probably, we should finish substreams support (#44) first, and then implement a stream-that-assembles-substreams.

Other than that, please note that Kaitai Struct (as opposed to DFDL, for example) is designed to follow the physical structure of a stream, not convert it on the fly into some other representation, so it's unlikely that you would get a tree like you've provided on a screenshot anyway. As the data is laid out in the file, you still have to travel through a directory tree (which is not flat in CFB). Probably we could provide something like a body attribute in the leaf that would yield complete reassembled stream, but, again, that would still need some work to do done in the language.

Meanwhile, such parsers are not completely useless anyway. Applications can still use them, but do reassembling of stream from the sectors in the app.

@GreyCat
Copy link
Member

GreyCat commented Jul 2, 2017

Actually, I've just pushed implementation of child, left_sibling and right_sibling accessors for dir_entry in microsoft_cfb.ksy: kaitai-io/kaitai_struct_formats@ad5cd4e. This, at least, allows one to traverse whole directory.

@chelh
Copy link
Author

chelh commented Jul 3, 2017

Thanks! I think I'll keep checking in periodically and take a further look at using Kaitai Struct once that reässembling feature is implemented. For now I'll write my own parsing code on top of OpenMCDF.

@chelh chelh closed this as completed Jul 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants