-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "bit-endian" and "tags" fields to the schema and document fields in the "MetaSpec" #16
base: master
Are you sure you want to change the base?
Conversation
Ok, seems that all comments addressed, ready to merge |
fce3421
to
e077399
Compare
@generalmimon , could you find a time to look at this again 😃 |
I've finish with this, can you give your last words? |
ksy_schema.json
Outdated
"bit-endian": { | ||
"enum": [ "le", "be" ], | ||
"default": "be", | ||
"description": "specifies the default byte order of integers, that unpacked to the bit fields, used in all fields of type `bX` of the current type and all its subtypes\n\nbit-sized integers are readed by reading a byte-sized integers of size `ceil(X/8)` in the specified endian and unpacking bit fields from it\n\nfor better understanding algorithm of parsing please read https://doc.kaitai.io/user_guide.html#_bit_sized_integers" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specifies the default byte order of integers, that unpacked to the bit fields, (...)
bit-sized integers are readed by reading a byte-sized integers of size
ceil(X/8)
in the specified endian and unpacking bit fields from it
-
Please avoid using the term bit field in connection with Kaitai Struct. It is ambiguous, because it isn't clear if you refer to the individual indivisible bit-sized values (i.e.
type: bX
members) or the entire compound structure consisting of several bit values, which would be in line with the established meaning from C and C++. On the other hand, we use a field in KS for referring to theseq
item, which would indicate the former meaning.When I don't want to use bit(-sized) integers or bit(-sized) values everywhere, I sometimes use bit field members, which does not suffer from this problem (it clearly indicates the individual values).
-
The word readed doesn't exist in Standard English. The past form of the verb read /r:id/ is read /red/.
-
What is "a byte-sized integers"? It has to be either "a byte-sized integer
s" or "abyte-sized integers". -
First, this is an implementation detail which isn't really relevant to the user. I consider it rather confusing and misleading in this case. Second, the formulation you use is oversimplified and inaccurate.
It doesn't exactly work the way how unpacking members of bit fields is usually done in parsers: reading a single multi-byte integer in the specified endianness and subsequently extracting all the bit values using
&
and>>
operators with precalculated bitmasks and bit shifts. I mean that this specmeta: bit-endian: be seq: - id: version type: b4 - id: len_header type: b4
does not get transpiled into this:
Ahoj.prototype._read = function() { this._flags = this._io.readU1(); this.version = (this._flags & 0b1111_0000) >>> 4; this.lenHeader = (this._flags & 0b0000_1111) >>> 0; }
but this:
Ahoj.prototype._read = function() { this.version = this._io.readBitsIntBe(4); this.lenHeader = this._io.readBitsIntBe(4); }
You may argue that all the reading of byte-sized integers, AND-ing and right-shifting is still done under the hood of the
readBitsIntBe
method. That's true, but note that happens at run time, not compile time. The bit masks and shifts are not precalculated and that makes a huge difference, because you can parse entire unaligned bit streams (e.g. consider deflate and bzip2) and you can make decisions on the fly whether you want to parse one or the other bit value (depending on the values you've read before).This makes the description "reading a byte-sized integer and unpacking bit fields from it" pretty useless. In a unaligned bit stream, you don't care about any byte-sized integers - you simply have a continuous stream of bits, and sometimes you read 3 of them and sometimes 25. In this case, the presence of bytes is rather annoying than helpful (in fact, you don't need them at all) and if our implementation of
readBitsInt{Be,Le}()
methods deals with them, it's just an implementation detail. That's why I would like to avoid callingbit-endian
just as a byte order. This description is also insufficient - it says nothing about the layout of bit-sized values within the bytes.I guess the only viable way of describing our bit endianness is as a bit layout. It is a method of how the parsed bit-sized values are laid out within each byte and across bytes (if needed).
- If you choose
bit-endian: be
, KS assumes that bit values start at the most significant bit (hereinafter MSB, the other end is LSB) and follow one another consecutively up to the LSB of the first byte, while the next bit after the 1st byte's LSB is the MSB of the 2nd byte. - If you opt for
bit-endian: le
, KS assumes that bit values start at the 1st byte's LSB and follow one another consecutively up to the MSB, while the next bit after the 1st byte's MSB is the LSB of the 2nd byte.
- If you choose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@generalmimon , then please can you formulate brief and correctly enough description that we can put there?
ksy_schema.json
Outdated
"license": { "type": "string" }, | ||
"tags": { | ||
"type": "array", | ||
"description": "list of some identifiers that can give additional information about the format\n\nshould be written in `lowercase-kebab-case` and listed in alphabetical order\n\nshould be used only at the top level", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pondering over this one:
should be written in
lowercase-kebab-case
and listed in alphabetical order
The key meta/tags
is currently used on the KSF site - a KSY spec is included into a category according to the folder in which is it located or if it has the corresponding tag
matching the name of the folder. We opted for snake_case
for the names of all .ksy files and directories (machine_code
), so I suppose meta/tags
should also use it?
If you create a format with this:
meta:
tags:
- machine_code
it will be added into the 🏭 CPU / Machine Code Disassembly category at formats.kaitai.io.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. There was no examples of multi-word tags, so I assumed that kebab-case would be convenient.
ksy_schema.json
Outdated
"android", | ||
"archive", | ||
"database", | ||
"dos", | ||
"executable", | ||
"filesystem", | ||
"firmware", | ||
"linux", | ||
"log", | ||
"macos", | ||
"media", | ||
"windows" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many of them. 2-3 will suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is list of all existing tags, so that users do not introduce new tags, and preferably use existing ones
@generalmimon , somehow I missed your last comments here. I fixed all your mentioned issues (I use descriptions provided by you) except |
…n" and "tags" Co-authored-by: Petr Pucil <[email protected]>
No description provided.