Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "bit-endian" and "tags" fields to the schema and document fields in the "MetaSpec" #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Mingun
Copy link
Contributor

@Mingun Mingun commented Nov 25, 2020

No description provided.

ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
@Mingun
Copy link
Contributor Author

Mingun commented Nov 27, 2020

Ok, seems that all comments addressed, ready to merge

ksy_schema.json Outdated Show resolved Hide resolved
@Mingun Mingun requested a review from generalmimon November 27, 2020 19:24
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
@Mingun Mingun force-pushed the patch-1 branch 2 times, most recently from fce3421 to e077399 Compare November 30, 2020 07:01
@Mingun Mingun requested a review from generalmimon November 30, 2020 07:29
@Mingun
Copy link
Contributor Author

Mingun commented Dec 17, 2020

@generalmimon , could you find a time to look at this again 😃

ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
@Mingun
Copy link
Contributor Author

Mingun commented Dec 22, 2020

I've finish with this, can you give your last words?

ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated
"bit-endian": {
"enum": [ "le", "be" ],
"default": "be",
"description": "specifies the default byte order of integers, that unpacked to the bit fields, used in all fields of type `bX` of the current type and all its subtypes\n\nbit-sized integers are readed by reading a byte-sized integers of size `ceil(X/8)` in the specified endian and unpacking bit fields from it\n\nfor better understanding algorithm of parsing please read https://doc.kaitai.io/user_guide.html#_bit_sized_integers"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifies the default byte order of integers, that unpacked to the bit fields, (...)

bit-sized integers are readed by reading a byte-sized integers of size ceil(X/8) in the specified endian and unpacking bit fields from it

  1. Please avoid using the term bit field in connection with Kaitai Struct. It is ambiguous, because it isn't clear if you refer to the individual indivisible bit-sized values (i.e. type: bX members) or the entire compound structure consisting of several bit values, which would be in line with the established meaning from C and C++. On the other hand, we use a field in KS for referring to the seq item, which would indicate the former meaning.

    When I don't want to use bit(-sized) integers or bit(-sized) values everywhere, I sometimes use bit field members, which does not suffer from this problem (it clearly indicates the individual values).

  2. The word readed doesn't exist in Standard English. The past form of the verb read /r:id/ is read /red/.

  3. What is "a byte-sized integers"? It has to be either "a byte-sized integers" or "a byte-sized integers".

  4. First, this is an implementation detail which isn't really relevant to the user. I consider it rather confusing and misleading in this case. Second, the formulation you use is oversimplified and inaccurate.

    It doesn't exactly work the way how unpacking members of bit fields is usually done in parsers: reading a single multi-byte integer in the specified endianness and subsequently extracting all the bit values using & and >> operators with precalculated bitmasks and bit shifts. I mean that this spec

    meta:
      bit-endian: be
    seq:
      - id: version
        type: b4
      - id: len_header
        type: b4

    does not get transpiled into this:

      Ahoj.prototype._read = function() {
        this._flags = this._io.readU1();
        this.version = (this._flags & 0b1111_0000) >>> 4;
        this.lenHeader = (this._flags & 0b0000_1111) >>> 0;
      }

    but this:

      Ahoj.prototype._read = function() {
        this.version = this._io.readBitsIntBe(4);
        this.lenHeader = this._io.readBitsIntBe(4);
      }

    You may argue that all the reading of byte-sized integers, AND-ing and right-shifting is still done under the hood of the readBitsIntBe method. That's true, but note that happens at run time, not compile time. The bit masks and shifts are not precalculated and that makes a huge difference, because you can parse entire unaligned bit streams (e.g. consider deflate and bzip2) and you can make decisions on the fly whether you want to parse one or the other bit value (depending on the values you've read before).

    This makes the description "reading a byte-sized integer and unpacking bit fields from it" pretty useless. In a unaligned bit stream, you don't care about any byte-sized integers - you simply have a continuous stream of bits, and sometimes you read 3 of them and sometimes 25. In this case, the presence of bytes is rather annoying than helpful (in fact, you don't need them at all) and if our implementation of readBitsInt{Be,Le}() methods deals with them, it's just an implementation detail. That's why I would like to avoid calling bit-endian just as a byte order. This description is also insufficient - it says nothing about the layout of bit-sized values within the bytes.

    I guess the only viable way of describing our bit endianness is as a bit layout. It is a method of how the parsed bit-sized values are laid out within each byte and across bytes (if needed).

    • If you choose bit-endian: be, KS assumes that bit values start at the most significant bit (hereinafter MSB, the other end is LSB) and follow one another consecutively up to the LSB of the first byte, while the next bit after the 1st byte's LSB is the MSB of the 2nd byte.
    • If you opt for bit-endian: le, KS assumes that bit values start at the 1st byte's LSB and follow one another consecutively up to the MSB, while the next bit after the 1st byte's MSB is the LSB of the 2nd byte.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@generalmimon , then please can you formulate brief and correctly enough description that we can put there?

ksy_schema.json Outdated
"license": { "type": "string" },
"tags": {
"type": "array",
"description": "list of some identifiers that can give additional information about the format\n\nshould be written in `lowercase-kebab-case` and listed in alphabetical order\n\nshould be used only at the top level",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pondering over this one:

should be written in lowercase-kebab-case and listed in alphabetical order

The key meta/tags is currently used on the KSF site - a KSY spec is included into a category according to the folder in which is it located or if it has the corresponding tag matching the name of the folder. We opted for snake_case for the names of all .ksy files and directories (machine_code), so I suppose meta/tags should also use it?

If you create a format with this:

meta:
  tags:
    - machine_code

it will be added into the 🏭 CPU / Machine Code Disassembly category at formats.kaitai.io.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. There was no examples of multi-word tags, so I assumed that kebab-case would be convenient.

ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated Show resolved Hide resolved
ksy_schema.json Outdated
Comment on lines 232 to 247
"android",
"archive",
"database",
"dos",
"executable",
"filesystem",
"firmware",
"linux",
"log",
"macos",
"media",
"windows"
Copy link
Member

@generalmimon generalmimon Dec 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many of them. 2-3 will suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is list of all existing tags, so that users do not introduce new tags, and preferably use existing ones

@bersbersbers
Copy link

I anyone continuing work on this? I am getting false positives in VS Code from missing bit-endian in the schema:

image

@Mingun
Copy link
Contributor Author

Mingun commented Jun 24, 2023

@generalmimon , somehow I missed your last comments here. I fixed all your mentioned issues (I use descriptions provided by you) except bit-endian where I do not know what to write to be correct. Could you help with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants