Add "bit-endian" and "tags" fields to the schema and document fields in the "MetaSpec" #16

Mingun · 2020-11-25T16:09:14Z

No description provided.

ksy_schema.json

Mingun · 2020-11-27T19:18:49Z

Ok, seems that all comments addressed, ready to merge

ksy_schema.json

Mingun · 2020-12-17T19:30:41Z

@generalmimon , could you find a time to look at this again 😃

ksy_schema.json

Mingun · 2020-12-22T14:50:25Z

I've finish with this, can you give your last words?

ksy_schema.json

generalmimon · 2020-12-26T20:52:21Z

ksy_schema.json

+        "bit-endian": {
+          "enum": [ "le", "be" ],
+          "default": "be",
+          "description": "specifies the default byte order of integers, that unpacked to the bit fields, used in all fields of type `bX` of the current type and all its subtypes\n\nbit-sized integers are readed by reading a byte-sized integers of size `ceil(X/8)` in the specified endian and unpacking bit fields from it\n\nfor better understanding algorithm of parsing please read https://doc.kaitai.io/user_guide.html#_bit_sized_integers"


specifies the default byte order of integers, that unpacked to the bit fields, (...)

bit-sized integers are readed by reading a byte-sized integers of size ceil(X/8) in the specified endian and unpacking bit fields from it

Please avoid using the term bit field in connection with Kaitai Struct. It is ambiguous, because it isn't clear if you refer to the individual indivisible bit-sized values (i.e. type: bX members) or the entire compound structure consisting of several bit values, which would be in line with the established meaning from C and C++. On the other hand, we use a field in KS for referring to the seq item, which would indicate the former meaning.

When I don't want to use bit(-sized) integers or bit(-sized) values everywhere, I sometimes use bit field members, which does not suffer from this problem (it clearly indicates the individual values).

The word readed doesn't exist in Standard English. The past form of the verb read /r:id/ is read /red/.

What is "a byte-sized integers"? It has to be either "a byte-sized integers" or "a byte-sized integers".

First, this is an implementation detail which isn't really relevant to the user. I consider it rather confusing and misleading in this case. Second, the formulation you use is oversimplified and inaccurate.

It doesn't exactly work the way how unpacking members of bit fields is usually done in parsers: reading a single multi-byte integer in the specified endianness and subsequently extracting all the bit values using & and >> operators with precalculated bitmasks and bit shifts. I mean that this spec

meta: bit-endian: be seq: - id: version type: b4 - id: len_header type: b4

does not get transpiled into this:

Ahoj.prototype._read = function() { this._flags = this._io.readU1(); this.version = (this._flags & 0b1111_0000) >>> 4; this.lenHeader = (this._flags & 0b0000_1111) >>> 0; }

but this:

Ahoj.prototype._read = function() { this.version = this._io.readBitsIntBe(4); this.lenHeader = this._io.readBitsIntBe(4); }

You may argue that all the reading of byte-sized integers, AND-ing and right-shifting is still done under the hood of the readBitsIntBe method. That's true, but note that happens at run time, not compile time. The bit masks and shifts are not precalculated and that makes a huge difference, because you can parse entire unaligned bit streams (e.g. consider deflate and bzip2) and you can make decisions on the fly whether you want to parse one or the other bit value (depending on the values you've read before).

This makes the description "reading a byte-sized integer and unpacking bit fields from it" pretty useless. In a unaligned bit stream, you don't care about any byte-sized integers - you simply have a continuous stream of bits, and sometimes you read 3 of them and sometimes 25. In this case, the presence of bytes is rather annoying than helpful (in fact, you don't need them at all) and if our implementation of readBitsInt{Be,Le}() methods deals with them, it's just an implementation detail. That's why I would like to avoid calling bit-endian just as a byte order. This description is also insufficient - it says nothing about the layout of bit-sized values within the bytes.

I guess the only viable way of describing our bit endianness is as a bit layout. It is a method of how the parsed bit-sized values are laid out within each byte and across bytes (if needed).

If you choose bit-endian: be, KS assumes that bit values start at the most significant bit (hereinafter MSB, the other end is LSB) and follow one another consecutively up to the LSB of the first byte, while the next bit after the 1st byte's LSB is the MSB of the 2nd byte.

If you opt for bit-endian: le, KS assumes that bit values start at the 1st byte's LSB and follow one another consecutively up to the MSB, while the next bit after the 1st byte's MSB is the LSB of the 2nd byte.

@generalmimon , then please can you formulate brief and correctly enough description that we can put there?

generalmimon · 2020-12-26T21:33:19Z

ksy_schema.json

-        "license": { "type": "string" },
+        "tags": {
+          "type": "array",
+          "description": "list of some identifiers that can give additional information about the format\n\nshould be written in `lowercase-kebab-case` and listed in alphabetical order\n\nshould be used only at the top level",


I'm pondering over this one:

should be written in lowercase-kebab-case and listed in alphabetical order

The key meta/tags is currently used on the KSF site - a KSY spec is included into a category according to the folder in which is it located or if it has the corresponding tag matching the name of the folder. We opted for snake_case for the names of all .ksy files and directories (machine_code), so I suppose meta/tags should also use it?

If you create a format with this:

meta: tags: - machine_code

it will be added into the 🏭 CPU / Machine Code Disassembly category at formats.kaitai.io.

Maybe. There was no examples of multi-word tags, so I assumed that kebab-case would be convenient.

ksy_schema.json

generalmimon · 2020-12-26T21:51:32Z

ksy_schema.json

+            "android",
+            "archive",
+            "database",
+            "dos",
+            "executable",
+            "filesystem",
+            "firmware",
+            "linux",
+            "log",
+            "macos",
+            "media",
+            "windows"


Too many of them. 2-3 will suffice.

Actually, this is list of all existing tags, so that users do not introduce new tags, and preferably use existing ones

bersbersbers · 2022-11-02T08:30:17Z

I anyone continuing work on this? I am getting false positives in VS Code from missing bit-endian in the schema:

Mingun · 2023-06-24T19:04:46Z

@generalmimon , somehow I missed your last comments here. I fixed all your mentioned issues (I use descriptions provided by you) except bit-endian where I do not know what to write to be correct. Could you help with this?

…n" and "tags" Co-authored-by: Petr Pucil <[email protected]>

dgelessus reviewed Nov 25, 2020

View reviewed changes

Mingun force-pushed the patch-1 branch from 7660158 to 9cbad48 Compare November 27, 2020 14:19

Mingun requested a review from dgelessus November 27, 2020 14:22

generalmimon reviewed Nov 27, 2020

View reviewed changes

ksy_schema.json Outdated Show resolved Hide resolved

Mingun requested a review from generalmimon November 27, 2020 19:24

generalmimon requested changes Nov 27, 2020

View reviewed changes

Mingun force-pushed the patch-1 branch 2 times, most recently from fce3421 to e077399 Compare November 30, 2020 07:01

Mingun requested a review from generalmimon November 30, 2020 07:29

generalmimon reviewed Dec 17, 2020

View reviewed changes

ksy_schema.json Outdated Show resolved Hide resolved

generalmimon reviewed Dec 17, 2020

View reviewed changes

ksy_schema.json Outdated Show resolved Hide resolved

generalmimon reviewed Dec 17, 2020

View reviewed changes

ksy_schema.json Outdated Show resolved Hide resolved

Mingun requested a review from generalmimon December 21, 2020 12:32

Mingun force-pushed the patch-1 branch from 1348dfb to 788ab04 Compare December 22, 2020 14:48

generalmimon requested changes Dec 26, 2020

View reviewed changes

Add missing documentation to keys in the MetaSpec, and add "bit-endia…

4b3b0d3

…n" and "tags" Co-authored-by: Petr Pucil <[email protected]>

Mingun force-pushed the patch-1 branch from 788ab04 to 4b3b0d3 Compare June 24, 2023 19:05

Mingun requested a review from generalmimon June 24, 2023 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "bit-endian" and "tags" fields to the schema and document fields in the "MetaSpec" #16

Add "bit-endian" and "tags" fields to the schema and document fields in the "MetaSpec" #16

Mingun commented Nov 25, 2020

Mingun commented Nov 27, 2020

Mingun commented Dec 17, 2020

Mingun commented Dec 22, 2020

generalmimon Dec 26, 2020

Mingun Jun 24, 2023

generalmimon Dec 26, 2020

Mingun Jun 24, 2023

generalmimon Dec 26, 2020 •

edited

Loading

Mingun Jun 24, 2023

bersbersbers commented Nov 2, 2022

Mingun commented Jun 24, 2023

Add "bit-endian" and "tags" fields to the schema and document fields in the "MetaSpec" #16

Are you sure you want to change the base?

Add "bit-endian" and "tags" fields to the schema and document fields in the "MetaSpec" #16

Conversation

Mingun commented Nov 25, 2020

Mingun commented Nov 27, 2020

Mingun commented Dec 17, 2020

Mingun commented Dec 22, 2020

generalmimon Dec 26, 2020

Choose a reason for hiding this comment

Mingun Jun 24, 2023

Choose a reason for hiding this comment

generalmimon Dec 26, 2020

Choose a reason for hiding this comment

Mingun Jun 24, 2023

Choose a reason for hiding this comment

generalmimon Dec 26, 2020 • edited Loading

Choose a reason for hiding this comment

Mingun Jun 24, 2023

Choose a reason for hiding this comment

bersbersbers commented Nov 2, 2022

Mingun commented Jun 24, 2023

generalmimon Dec 26, 2020 •

edited

Loading