Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading garbage in types with both size and terminator #613

Closed
KOLANICH opened this issue Aug 11, 2019 · 3 comments
Closed

Reading garbage in types with both size and terminator #613

KOLANICH opened this issue Aug 11, 2019 · 3 comments

Comments

@KOLANICH
Copy link

KOLANICH commented Aug 11, 2019

In some formats one can see that null-terminated strings of some fixed size are used. Often the rest of such strings is not zero-filled, but contain traces of data once been in from process address space by that addr. Though it is possible to write a spec in the way allowing extraction of this data (quite inefficient though), it may make sense to allow them out of the box.

@GreyCat
Copy link
Member

GreyCat commented Sep 16, 2019

Could you clarify what exactly do you mean? Current implementation of something like that:

seq:
  - id: foo
    size: 40
    terminator: 0

... is specifically engineered to have max length of 40 bytes and trim all the garbage after "0" byte. If someone wants to parse that garbage instead, then one's totally free to do something like:

seq:
  - id: foo
    size: 40
    type: terminated_string_and_garbage
types:
  terminated_string_and_garbage:
    seq:
      - id: the_string
        terminator: 0
      - id: garbage
        size-eos: true

Am I missing your point?

@KOLANICH
Copy link
Author

KOLANICH commented Sep 16, 2019

Yes, but it has overhead of storing the same bytes twice. Also I am not sure it correctly processes the cases when the terminator is not within these 40 bytes (for example the native parser reserves 41 byte, sets the 41st to 0 and then strncpyes 40 bytes in front of them, so then the 40 bytes can be without terminator at all). So I feel like it should be the built-in feature activated by a compiler flag or maybe a separate built-in type.

@GreyCat
Copy link
Member

GreyCat commented Sep 17, 2019

The overhead depends on the implementation. With cleaner implementation of substreams #44, it won't be an issue.

Also I am not sure it correctly processes the cases when the terminator is not within these 40 bytes

Sounds exactly like a use case for eos-error: false.

So, in other words, it's already implemented and I don't see why we should change these designs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants