Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on switch functionality #207

Closed
JaapAap opened this issue Jul 14, 2017 · 13 comments
Closed

Question on switch functionality #207

JaapAap opened this issue Jul 14, 2017 · 13 comments
Labels

Comments

@JaapAap
Copy link

JaapAap commented Jul 14, 2017

Suppose I have a set of messages that contain a 12 bit preamble with the message ID followed by a message body of 160 bits, with different contents for different messages. Can the current switch functionality support such a case, in which the message body does not start on a byte-boundary?

@GreyCat
Copy link
Member

GreyCat commented Jul 14, 2017

It's not really related to switch, rather than it is an issue of alignment (see #12).

Right now we've implemented a very simple alignment scheme, that is everything is byte-aligned, except for sequences of types that are known to require bit-level alignment (that is basically bX types). So, if you do:

seq:
  - id: one
    type: b1
  - id: two
    type: b7

that compiles into

    m_one = m__io->read_bits_int(1);
    m_two = m__io->read_bits_int(7);

but as soon as you'll interject these bX with some other stuff, like u1, it would automatically insert byte alignment statement, i.e. this one:

seq:
  - id: one
    type: b1
  - id: cccombo_breaker
    type: u1
  - id: two
    type: b7

compiles into:

    m_one = m__io->read_bits_int(1);
    m__io->align_to_byte(); // <================ this one!
    m_cccombo_breaker = m__io->read_u1();
    m_two = m__io->read_bits_int(7);

As you might have guessed, both "switch types" and "user types" fall into "other stuff" category, i.e. they would invoke byte alignment statement. So, we need some sort of flexible alignment specification system that would at least say which types / attributes require which alignment statements before / after them.

So, I suggest to continue this discussion in #12, if it's relevant.

@JaapAap
Copy link
Author

JaapAap commented Jul 14, 2017

Happy to discuss elsewhere, but that discussion mainly relates to specifying ways to force certain types off byte alignment, while I am having a bit-alignment issue. Even if the align_to_byte(); call would be a no-op, if I understand well, the read_bytes routine still assumes that it starts on a byte boundary (obviously the same problem is true for read_u1() etc. So in my case, I would need all alignments to be turned of.

If the idea is to -in the end - replace the current 'automatic' alignment by 'user-defined' alignment and supporting a 'no-alignment' case would become possible that would be great. While these issues are not orthogonal, I am not convinced that the existing discussion is the right place to continue. Thoughts?

@koczkatamas
Copy link
Member

Could you show me how would you parse those 160 bits? As 20 bytes? So basically it's a data format which stores every byte shifted by 4 bits?

pX = preamble bit X
bX = data bit X

           [       --- byte 1 ---      ] [       --- byte 2 ---      ]
input : ...[p8|p9|p10|p11 | b0|b1|b2|b3] [b4|b5|b6|b7 | b8|b9|b10|b11]...
output:                    [b0|b1|b2|b3 | b4|b5|b6|b7] [b8|b9|b10|b11...
                           [      --- byte 0 ---     ] [      --- byte 1 ---     ]

@GreyCat
Copy link
Member

GreyCat commented Jul 14, 2017

Yeah, you're completely right, it's not a very trivial issue. read_bytes needs to be adapted — or, rather, compiler needs to call either read_bytes (which is byte-aligned version) or something like read_bytes_unaligned (which should cleverly shift all bytes read and prepend remaining bit buffer, if it exists). So, yeah, we'll need "unaligned" versions of all reading primitives as well.

My idea is actually this and that issue have a lot in common. Basically, the difference in generated code should be insertion/abscence of certain align(...) calls before and after reading calls — be it:

  • multi-byte-level alignment,
  • 1-byte alignment,
  • multi-bit-level alignment,
  • or no alignment at all (effectively a 1-bit alignment)

That, and, of course, bit-level reading primitives. So my thought was to make it look the same for end-user.

@JaapAap
Copy link
Author

JaapAap commented Jul 14, 2017

Thanks for that - indeed, I agree that this approach addresses both issues.

@tamas: the 160 bits contain a variety of bit-fields. Essentially, the data structure in my example is 172 bits of which the first 12 bits contain the ID which determines which of a number of message formats to select. This requires me to treat the 12 bits ID separately as I can't simply parse them as part of each of the messages (hence my reference to the switch).

@koczkatamas
Copy link
Member

koczkatamas commented Jul 14, 2017

I asked this because the byte arrays are usually byte-aligned, so maybe we don't need a read_bytes_unaligned method here. And bX types are already parsed as integers, so if the align_to_byte call wouldn't be there before the switch then maybe you could already use the bX types to parse your data structure. (Of course if you reach a byte boundary you can switch to uX from there.)

Edit

So if we look at this example:

meta:
  id: testalign
seq:
  - id: preamble
    type: b12
  - id: data
    type:
      switch-on: preamble
      cases:
        0x504: type1
types:
  type1:
    seq:
      - id: field1
        type: b4
      - id: field2
        type: b8

Currently this generates the following code (alignToByte is not commented out):

image

If I comment the alignToByte out and re-parse the input (you can do this in the WebIDE by pressing Ctrl+Enter in the JS window), then it will read the field1 from the b13..b16 bits.

If my input is 50 4b 03, then this is the parsed data structure:

image

@JaapAap
Copy link
Author

JaapAap commented Jul 14, 2017

Thanks Tamas, I understand what you are aiming at and thanks for showing me how to achieve it.

@GreyCat
Copy link
Member

GreyCat commented Jul 14, 2017

@JaapAap There is an initiative to migrate from these intermediate byte arrays to "asking for substreams" directly in #44. But this still needs that bit-level support anyway. Even if instead of:

    m__raw_foo = m__io->read_bytes(4);
    m__io__raw_foo = new kaitai::kstream(m__raw_foo);
    m_foo = new foo_t(m__io__raw_foo, this, m__root);

we'll be doing something like:

    m__io__raw_foo = m__io->read_substream(4);
    m_foo = new foo_t(m__io__raw_foo, this, m__root);

... that read_substream() would still need to take bit positions into account, and, probably there would be 2 distinct versions of that procedure - "byte-aligned" and "byte-unaligned".

@koczkatamas
Copy link
Member

Yes, currently the intermediate byte array is not just can be avoided, but must be avoided AFAIK as you cannot create a reader stream from a byte array (without additional non-ksy code) - related issue: #44.

Of course you could parse everything as a byte array and extract various bits via expressions and instances, but that way you would lose the declarative capabilities of Kaitai, so I would not recommend that method.

@GreyCat
Copy link
Member

GreyCat commented Jul 14, 2017

So, generally, it's not a question of "shall we do unaligned reads or not" — the question is a priority of that task. And, yeah, unfortunately, given that 99% of users probably can live without it, it's somewhat around mid-low.

@koczkatamas
Copy link
Member

I am just saying that I have a feeling that the current issue can be solved without unaligned reads (except the bX which we already support).

My biggest fear is that we want to do something which can be architecturally bad. So before going that way I want to be sure I understand the problem correctly.

For example let's say we need an aligned sub-stream: that implementation probably shouldn't copy the input byte array by shifting every byte or something like that. It just 'simply' have to clone the current bit alignment (eg. bitsLeft variable in the JS runtime) to the new sub-stream.

I never met a file format which used eg. bit-aligned byte arrays. For bit-aligned integers, the current bX implementation should be sufficient. The only drawback I see that it always uses a 64-bit integer, while eg. a b12 could be stored as just a short (u2, 16-bit, etc) integer, but this probably can be fixed easily.

@JaapAap
Copy link
Author

JaapAap commented Jul 15, 2017

@tamas: yes, indeed my problem can be dealt with using current functionality (that is, I have already extended the functionality by allowing for signed bit fields, see also #121 and #155). I might also look into the integer size issue you mentioned at some point.

@cugu cugu added the question label Sep 14, 2019
@cugu cugu closed this as completed Sep 16, 2019
@cugu
Copy link
Contributor

cugu commented Sep 16, 2019

I'll close this issue as is seems to be resolved or rejected. Please reopen if you feel this is wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants