Question on switch functionality #207

JaapAap · 2017-07-14T16:01:57Z

Suppose I have a set of messages that contain a 12 bit preamble with the message ID followed by a message body of 160 bits, with different contents for different messages. Can the current switch functionality support such a case, in which the message body does not start on a byte-boundary?

GreyCat · 2017-07-14T16:09:31Z

It's not really related to switch, rather than it is an issue of alignment (see #12).

Right now we've implemented a very simple alignment scheme, that is everything is byte-aligned, except for sequences of types that are known to require bit-level alignment (that is basically bX types). So, if you do:

seq:
  - id: one
    type: b1
  - id: two
    type: b7

that compiles into

    m_one = m__io->read_bits_int(1);
    m_two = m__io->read_bits_int(7);

but as soon as you'll interject these bX with some other stuff, like u1, it would automatically insert byte alignment statement, i.e. this one:

seq:
  - id: one
    type: b1
  - id: cccombo_breaker
    type: u1
  - id: two
    type: b7

compiles into:

    m_one = m__io->read_bits_int(1);
    m__io->align_to_byte(); // <================ this one!
    m_cccombo_breaker = m__io->read_u1();
    m_two = m__io->read_bits_int(7);

As you might have guessed, both "switch types" and "user types" fall into "other stuff" category, i.e. they would invoke byte alignment statement. So, we need some sort of flexible alignment specification system that would at least say which types / attributes require which alignment statements before / after them.

So, I suggest to continue this discussion in #12, if it's relevant.

JaapAap · 2017-07-14T16:35:22Z

Happy to discuss elsewhere, but that discussion mainly relates to specifying ways to force certain types off byte alignment, while I am having a bit-alignment issue. Even if the align_to_byte(); call would be a no-op, if I understand well, the read_bytes routine still assumes that it starts on a byte boundary (obviously the same problem is true for read_u1() etc. So in my case, I would need all alignments to be turned of.

If the idea is to -in the end - replace the current 'automatic' alignment by 'user-defined' alignment and supporting a 'no-alignment' case would become possible that would be great. While these issues are not orthogonal, I am not convinced that the existing discussion is the right place to continue. Thoughts?

koczkatamas · 2017-07-14T16:47:26Z

Could you show me how would you parse those 160 bits? As 20 bytes? So basically it's a data format which stores every byte shifted by 4 bits?

pX = preamble bit X
bX = data bit X

           [       --- byte 1 ---      ] [       --- byte 2 ---      ]
input : ...[p8|p9|p10|p11 | b0|b1|b2|b3] [b4|b5|b6|b7 | b8|b9|b10|b11]...
output:                    [b0|b1|b2|b3 | b4|b5|b6|b7] [b8|b9|b10|b11...
                           [      --- byte 0 ---     ] [      --- byte 1 ---     ]

GreyCat · 2017-07-14T16:47:44Z

Yeah, you're completely right, it's not a very trivial issue. read_bytes needs to be adapted — or, rather, compiler needs to call either read_bytes (which is byte-aligned version) or something like read_bytes_unaligned (which should cleverly shift all bytes read and prepend remaining bit buffer, if it exists). So, yeah, we'll need "unaligned" versions of all reading primitives as well.

My idea is actually this and that issue have a lot in common. Basically, the difference in generated code should be insertion/abscence of certain align(...) calls before and after reading calls — be it:

multi-byte-level alignment,
1-byte alignment,
multi-bit-level alignment,
or no alignment at all (effectively a 1-bit alignment)

That, and, of course, bit-level reading primitives. So my thought was to make it look the same for end-user.

JaapAap · 2017-07-14T17:28:14Z

Thanks for that - indeed, I agree that this approach addresses both issues.

@tamas: the 160 bits contain a variety of bit-fields. Essentially, the data structure in my example is 172 bits of which the first 12 bits contain the ID which determines which of a number of message formats to select. This requires me to treat the 12 bits ID separately as I can't simply parse them as part of each of the messages (hence my reference to the switch).

koczkatamas · 2017-07-14T17:43:44Z

I asked this because the byte arrays are usually byte-aligned, so maybe we don't need a read_bytes_unaligned method here. And bX types are already parsed as integers, so if the align_to_byte call wouldn't be there before the switch then maybe you could already use the bX types to parse your data structure. (Of course if you reach a byte boundary you can switch to uX from there.)

Edit

So if we look at this example:

meta:
  id: testalign
seq:
  - id: preamble
    type: b12
  - id: data
    type:
      switch-on: preamble
      cases:
        0x504: type1
types:
  type1:
    seq:
      - id: field1
        type: b4
      - id: field2
        type: b8

Currently this generates the following code (alignToByte is not commented out):

If I comment the alignToByte out and re-parse the input (you can do this in the WebIDE by pressing Ctrl+Enter in the JS window), then it will read the field1 from the b13..b16 bits.

If my input is 50 4b 03, then this is the parsed data structure:

JaapAap · 2017-07-14T17:57:21Z

Thanks Tamas, I understand what you are aiming at and thanks for showing me how to achieve it.

GreyCat · 2017-07-14T18:02:44Z

@JaapAap There is an initiative to migrate from these intermediate byte arrays to "asking for substreams" directly in #44. But this still needs that bit-level support anyway. Even if instead of:

    m__raw_foo = m__io->read_bytes(4);
    m__io__raw_foo = new kaitai::kstream(m__raw_foo);
    m_foo = new foo_t(m__io__raw_foo, this, m__root);

we'll be doing something like:

    m__io__raw_foo = m__io->read_substream(4);
    m_foo = new foo_t(m__io__raw_foo, this, m__root);

... that read_substream() would still need to take bit positions into account, and, probably there would be 2 distinct versions of that procedure - "byte-aligned" and "byte-unaligned".

koczkatamas · 2017-07-14T18:03:02Z

Yes, currently the intermediate byte array is not just can be avoided, but must be avoided AFAIK as you cannot create a reader stream from a byte array (without additional non-ksy code) - related issue: #44.

Of course you could parse everything as a byte array and extract various bits via expressions and instances, but that way you would lose the declarative capabilities of Kaitai, so I would not recommend that method.

GreyCat · 2017-07-14T18:05:03Z

So, generally, it's not a question of "shall we do unaligned reads or not" — the question is a priority of that task. And, yeah, unfortunately, given that 99% of users probably can live without it, it's somewhat around mid-low.

koczkatamas · 2017-07-14T18:18:20Z

I am just saying that I have a feeling that the current issue can be solved without unaligned reads (except the bX which we already support).

My biggest fear is that we want to do something which can be architecturally bad. So before going that way I want to be sure I understand the problem correctly.

For example let's say we need an aligned sub-stream: that implementation probably shouldn't copy the input byte array by shifting every byte or something like that. It just 'simply' have to clone the current bit alignment (eg. bitsLeft variable in the JS runtime) to the new sub-stream.

I never met a file format which used eg. bit-aligned byte arrays. For bit-aligned integers, the current bX implementation should be sufficient. The only drawback I see that it always uses a 64-bit integer, while eg. a b12 could be stored as just a short (u2, 16-bit, etc) integer, but this probably can be fixed easily.

JaapAap · 2017-07-15T10:18:04Z

@tamas: yes, indeed my problem can be dealt with using current functionality (that is, I have already extended the functionality by allowing for signed bit fields, see also #121 and #155). I might also look into the integer size issue you mentioned at some point.

cugu · 2019-09-16T18:22:12Z

I'll close this issue as is seems to be resolved or rejected. Please reopen if you feel this is wrong.

cugu added the question label Sep 14, 2019

cugu closed this as completed Sep 16, 2019

generalmimon mentioned this issue Oct 4, 2023

un-aligned, bit-based data stream #576

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on switch functionality #207

Question on switch functionality #207

JaapAap commented Jul 14, 2017

GreyCat commented Jul 14, 2017

JaapAap commented Jul 14, 2017

koczkatamas commented Jul 14, 2017

GreyCat commented Jul 14, 2017

JaapAap commented Jul 14, 2017

koczkatamas commented Jul 14, 2017 •

edited

Loading

JaapAap commented Jul 14, 2017 •

edited

Loading

GreyCat commented Jul 14, 2017

koczkatamas commented Jul 14, 2017

GreyCat commented Jul 14, 2017

koczkatamas commented Jul 14, 2017

JaapAap commented Jul 15, 2017

cugu commented Sep 16, 2019

Question on switch functionality #207

Question on switch functionality #207

Comments

JaapAap commented Jul 14, 2017

GreyCat commented Jul 14, 2017

JaapAap commented Jul 14, 2017

koczkatamas commented Jul 14, 2017

GreyCat commented Jul 14, 2017

JaapAap commented Jul 14, 2017

koczkatamas commented Jul 14, 2017 • edited Loading

Edit

JaapAap commented Jul 14, 2017 • edited Loading

GreyCat commented Jul 14, 2017

koczkatamas commented Jul 14, 2017

GreyCat commented Jul 14, 2017

koczkatamas commented Jul 14, 2017

JaapAap commented Jul 15, 2017

cugu commented Sep 16, 2019

koczkatamas commented Jul 14, 2017 •

edited

Loading

JaapAap commented Jul 14, 2017 •

edited

Loading