diff --git a/archive/ar/ar_bsd.ksy b/archive/ar/ar_bsd.ksy new file mode 100644 index 000000000..edc373dc3 --- /dev/null +++ b/archive/ar/ar_bsd.ksy @@ -0,0 +1,120 @@ +meta: + id: ar_bsd + title: Unix ar archive (BSD/Darwin variant) + application: ar + file-extension: + - a # Unix/generic + - rlib # Rust + - deb # Debian binary package + - udeb # Debian binary package + xref: + justsolve: AR + mime: application/x-archive + wikidata: Q300839 + license: CC0-1.0 + imports: + - member_metadata + - space_padded_number +doc: | + The BSD variant of the Unix ar archive format (see the `ar_generic` spec for general info about the ar format). This variant is also used on Darwin-based systems (mainly Apple's macOS and iOS). + + BSD archives support member names that contain spaces or are longer than 16 bytes by storing the name as part of the member data rather than in the fixed-size name field. +doc-ref: | + https://en.wikipedia.org/w/index.php?title=Ar_(Unix)&oldid=880452895#File_format_details + https://docs.oracle.com/cd/E36784_01/html/E36873/ar.h-3head.html + https://llvm.org/docs/CommandGuide/llvm-ar.html#file-format + https://github.com/llvm/llvm-project/blob/llvmorg-7.0.1/llvm/lib/Object/Archive.cpp +seq: + - id: magic + -orig-id: ARMAG + contents: "!\n" + doc: Magic number. + - id: members + type: member + repeat: eos + doc: List of archive members. May be empty. +types: + regular_member_name: + seq: + - id: name + terminator: 0x20 + pad-right: 0x20 + doc: The member name, right-padded with spaces. + doc: | + A regular (or "short") member name, stored directly in the name field. + + Note: Since regular names in BSD archives are terminated using spaces, file names that contain spaces cannot be stored as regular names. Such names must be stored as long names, even if they are not longer than 16 bytes. + long_member_name: + seq: + - id: magic + contents: '#1/' + doc: Indicates a long member name. + - id: name_size + type: space_padded_number(13, 10) + doc: The size of the long member name in bytes. + doc: A long member name, stored at the start of the member's data. + member_name: + seq: + - id: first_three_bytes + size: long_name_magic.length + doc: Internal helper field, do not use. + instances: + long_name_magic: + value: '[0x23, 0x31, 0x2f]' + doc: The ASCII bytes "#1/", indicating a long member name. + is_long: + value: first_three_bytes == long_name_magic + doc: Whether this is a reference to a long name (stored at the start of the archive data) or a regular name. + parsed: + pos: 0 + type: + switch-on: is_long + cases: + true: long_member_name + false: regular_member_name + member: + seq: + - id: name_internal + -orig-id: ar_name + size: 16 + type: member_name + doc: Internal helper field, do not use directly, use the `name` instance instead. + - id: metadata + type: member_metadata + doc: The member's metadata (timestamp, user and group ID, mode). + - id: size_raw + -orig-id: ar_size + size: 10 + type: space_padded_number(10, 10) + doc: Raw version of size_with_long_name. + - id: header_terminator + -orig-id: ar_fmag + contents: "`\n" + doc: Marks the end of the header. + - id: long_name + size: name_internal.parsed.as.name_size.value + terminator: 0x00 + pad-right: 0x00 + if: name_internal.is_long + doc: The member's long name, if any, possibly right-padded with null bytes. + - id: data + size: size + doc: The member's data. + - id: padding + contents: "\n" + if: size_with_long_name % 2 != 0 + doc: An extra newline is added as padding after members with an odd data size. This ensures that all members are 2-byte-aligned. + instances: + name: + value: 'name_internal.is_long ? long_name : name_internal.parsed.as.name' + doc: | + The name of the archive member. Because the encoding of member names varies across systems, the name is exposed as a byte array. + + Names are usually unique within an archive, but this is not required - the `ar` command even provides various options to work with archives containing multiple identically named members. + size_with_long_name: + value: size_raw.value + doc: The size of the member's data. The long member name (if any) counts toward this size value, but the trailing padding byte (if any) does not. + size: + value: 'name_internal.is_long ? size_with_long_name - name_internal.parsed.as.name_size.value : size_with_long_name' + doc: The size of the member's data, excluding any long member name. + doc: An archive member's header and data. diff --git a/archive/ar/ar_generic.ksy b/archive/ar/ar_generic.ksy new file mode 100644 index 000000000..9dc639cea --- /dev/null +++ b/archive/ar/ar_generic.ksy @@ -0,0 +1,73 @@ +meta: + id: ar_generic + title: Unix ar archive (generic superset) + application: ar + file-extension: + - a # Unix/generic + - lib # Windows + - rlib # Rust + - deb # Debian binary package + - udeb # Debian binary package + xref: + justsolve: AR + mime: application/x-archive + wikidata: Q300839 + license: CC0-1.0 + imports: + - member_metadata + - space_padded_number + # The ar format is somewhat unusual: although it can store arbitrary data files, the ar format itself is text-based - all fields and magic numbers are pure ASCII. + # In particular, numerical values are stored as ASCII-encoded decimal and octal numbers, rather than packed byte values. Because of this, the ar format has no endianness. + # No string encoding is specified either. As different systems use different encodings, all text (i. e. file names) are exposed as byte arrays. +doc: | + The Unix ar archive format, as created by the `ar` utility. It is a simple uncompressed flat archive format, but is rarely used for general-purpose archiving. Instead, it is commonly used by linkers to collect multiple object files along with a symbol table into a static library. The Debian package format (.deb) is also based on the ar format. + + The ar format is not standardized and several variants have been developed, which differ mainly in how member names and the symbol table (if any) are stored. This specification describes the basic structure shared by all ar variants. +doc-ref: | + https://en.wikipedia.org/w/index.php?title=Ar_(Unix)&oldid=880452895#File_format_details + https://docs.oracle.com/cd/E36784_01/html/E36873/ar.h-3head.html + https://llvm.org/docs/CommandGuide/llvm-ar.html#file-format + https://github.com/llvm/llvm-project/blob/llvmorg-7.0.1/llvm/lib/Object/Archive.cpp +seq: + - id: magic + -orig-id: ARMAG + contents: "!\n" + doc: Magic number. + - id: members + type: member + repeat: eos + doc: List of archive members. May be empty. +types: + member: + seq: + - id: name + -orig-id: ar_name + size: 16 + # We don't set a terminator for the name field, because different ar format variants use different terminators (see doc). + doc: | + The name of the archive member, right-padded with spaces. Because the exact format of this field differs between format variants, it is exposed as a fixed-size byte array. Long member names are not processed, and no terminator or padding characters are removed. To read member names correctly from an archive whose format variant is known, use the `ar_bsd` or `ar_sysv` specification. + + Names are usually unique within an archive, but this is not required - the `ar` command even provides various options to work with archives containing multiple identically named members. + - id: metadata + type: member_metadata + doc: The member's metadata (timestamp, user and group ID, mode). + - id: size_raw + -orig-id: ar_size + type: space_padded_number(10, 10) + doc: Raw version of size. + - id: header_terminator + -orig-id: ar_fmag + contents: "`\n" + doc: Marks the end of the header. + - id: data + size: size + doc: The member's data. + - id: padding + contents: "\n" + if: size % 2 != 0 + doc: An extra newline is added as padding after members with an odd data size. This ensures that all members are 2-byte-aligned. + instances: + size: + value: size_raw.value + doc: The size of the member's data. The trailing padding byte (if any) does not count toward the data size. + doc: An archive member's header and data. diff --git a/archive/ar/ar_gnu_thin.ksy b/archive/ar/ar_gnu_thin.ksy new file mode 100644 index 000000000..3241ccb21 --- /dev/null +++ b/archive/ar/ar_gnu_thin.ksy @@ -0,0 +1,136 @@ +meta: + id: ar_gnu_thin + title: GNU binutils thin ar archive + application: ar + file-extension: + - a + license: CC0-1.0 + imports: + - space_padded_number + - member_metadata +doc: | + The thin ar archive format, as created by the GNU binutils `ar` utility using the `T` flag. Thin archives are used by GNU binutils as a more efficient format for locally-created static libraries than the regular ar format. Thin archives only store the paths of all contained files (relative to the archive), but not the files' actual data - to read data from the archive, the original files need to be looked up and read. This makes thin archives unsuitable for general-purpose archiving (in fact, GNU `ar` does not support manually extracting thin archives), they are only meant to be used as a static library format. + + The internal structure of thin archives is very similar to regular System V/GNU ar archives, but the formats are not compatible. +doc-ref: https://sourceware.org/binutils/docs/binutils/ar.html +seq: + - id: magic + -orig-id: ARMAG + contents: "!\n" + doc: Magic number. + - id: members + type: member + repeat: eos + doc: List of archive members. May be empty. +instances: + long_name_list_name: + value: '[0x2f, 0x2f, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20]' + doc: The name of the special "long name list" member. This is a byte array containing "//" (two slashes) right-padded using 14 spaces (in ASCII). + long_name_list_index: + value: | + members.size > 0 and members[0].name_internal.raw == long_name_list_name ? 0 + : members.size > 1 and members[1].name_internal.raw == long_name_list_name ? 1 + : -1 + doc: | + The index of the special "long name list" member in the members array, or `-1` if this archive doesn't contain a long name list. + + Note: the long name list is only recognized if it is one of the first two archive members. This is because it it always appears immediately after the symbol table (or if there is no symbol table, at the very beginning of the archive). + long_name_list: + value: members[long_name_list_index] + if: long_name_list_index != -1 + doc: A special archive member that holds a list of long names used by other archive members. (Optional, only present if the archive has members with long names.) +types: + long_member_name: + seq: + - id: slash + contents: "/" + - id: offset + type: space_padded_number(15, 10) + doc: The byte offset in the long name list at which the actual member name is stored. + instances: + name: + io: _root.long_name_list.data_internal._io + pos: offset.value + # The terminator is actually a slash followed by a newline, but multi-character terminators are not supported by Kaitai, and it's very unlikely that a path will contain a newline. + terminator: 0x0a + doc: The member name (actually a relative path) stored in the long name list, terminated by a slash and a newline. For technical reasons, includes the terminating slash (but not the newline). + doc: A long member name (actually a relative path), stored as a reference into the long name list. + special_member_name: + seq: + - id: name + terminator: 0x20 + pad-right: 0x20 + doc: The member name, as a byte array, right-padded using ASCII spaces. + doc: A "special" member name that does not follow the usual format. This kind of name is used for special members that do not represent a normal file, such as the symbol table (named "/") and the long name list (named "//"). + member_name: + seq: + - id: raw + size: 16 + doc: The name of the archive member as a 16-byte array, including any padding spaces at the end. + instances: + ascii_zero: + value: 0x30 + ascii_nine: + value: 0x39 + first_char: + pos: 0 + type: u1 + second_char: + pos: 1 + type: u1 + is_long: + value: first_char == 0x2f and second_char >= ascii_zero and second_char <= ascii_nine + parsed: + pos: 0 + type: + switch-on: is_long + cases: + true: long_member_name + false: special_member_name + member_data: + seq: + - id: data + size-eos: true + doc: Dummy type representing a member's data. This type is used instead of a normal byte array to allow "looking into" it using instances (this is needed to handle long member names). + member: + seq: + - id: name_internal + -orig-id: ar_name + size: 16 + type: member_name + doc: Internal helper field, do not use directly, use the `name` instance instead. + - id: metadata + type: member_metadata + doc: The member's metadata (timestamp, user and group ID, mode). + - id: size_raw + -orig-id: ar_size + type: space_padded_number(10, 10) + doc: Raw version of size. + - id: header_terminator + -orig-id: ar_fmag + contents: "`\n" + doc: Marks the end of the header. + - id: data_internal + type: member_data + size: size + if: not name_internal.is_long + doc: Internal helper field, do not use directly, use the `data` instance instead. + - id: padding + contents: "\n" + if: not name_internal.is_long and size % 2 != 0 + doc: An extra newline is added as padding after members with an odd data size. This ensures that all members are 2-byte-aligned. + instances: + name: + value: 'name_internal.is_long ? name_internal.parsed.as.name : name_internal.parsed.as.name' + doc: | + The name of the archive member. Because the encoding of member names varies across systems, the name is exposed as a byte array. + + Names are usually unique within an archive, but this is not required - the `ar` command even provides various options to work with archives containing multiple identically named members. + size: + value: size_raw.value + doc: The size of the member's data. The trailing padding byte (if any) does not count toward the data size. + data: + value: data_internal.data + if: not name_internal.is_long + doc: The member's data. Only present for special members. + doc: An archive member's header and data. diff --git a/archive/ar/ar_sysv.ksy b/archive/ar/ar_sysv.ksy new file mode 100644 index 000000000..35151ceb3 --- /dev/null +++ b/archive/ar/ar_sysv.ksy @@ -0,0 +1,159 @@ +meta: + id: ar_sysv + title: Unix ar archive (System V/GNU/Windows variant) + application: ar + file-extension: + - a # Unix/generic + - lib # Windows + - rlib # Rust + - deb # Debian binary package + - udeb # Debian binary package + xref: + justsolve: AR + mime: application/x-archive + wikidata: Q300839 + license: CC0-1.0 + imports: + - member_metadata + - space_padded_number +doc: | + The System V variant of the Unix ar archive format (see the `ar_generic` spec for general info about the ar format). This variant is also used on Linux and Windows systems. + + System V archives support member names that contain spaces by terminating the name field using a slash instead of a space. File names longer than 16 bytes are supported by storing the name in a special archive member called "//" and only storing a byte offset in the member name field. +doc-ref: | + https://en.wikipedia.org/w/index.php?title=Ar_(Unix)&oldid=880452895#File_format_details + https://docs.oracle.com/cd/E36784_01/html/E36873/ar.h-3head.html + https://llvm.org/docs/CommandGuide/llvm-ar.html#file-format + https://github.com/llvm/llvm-project/blob/llvmorg-7.0.1/llvm/lib/Object/Archive.cpp +seq: + - id: magic + -orig-id: ARMAG + contents: "!\n" + doc: Magic number. + - id: members + type: member + repeat: eos + doc: List of archive members. May be empty. +instances: + long_name_list_name: + value: '[0x2f, 0x2f, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20]' + doc: The name of the special "long name list" member. This is a byte array containing "//" (two slashes) right-padded using 14 spaces (in ASCII). + long_name_list_index: + value: | + members.size > 0 and members[0].name_internal.raw == long_name_list_name ? 0 + : members.size > 1 and members[1].name_internal.raw == long_name_list_name ? 1 + : members.size > 2 and members[2].name_internal.raw == long_name_list_name ? 2 + : -1 + doc: | + The index of the special "long name list" member in the members array, or `-1` if this archive doesn't contain a long name list. + + Note: the long name list is only recognized if it is one of the first three archive members. This is because it it always appears immediately after the symbol table (or if there is no symbol table, at the very beginning of the archive). Windows archives can contain two symbol table members, so the long name list can be at most the third member. + long_name_list: + value: members[long_name_list_index] + if: long_name_list_index != -1 + doc: A special archive member that holds a list of long names used by other archive members. (Optional, only present if the archive has members with long names.) +types: + regular_member_name: + seq: + - id: name + terminator: 0x2f + pad-right: 0x20 + doc: The member name, terminated by a slash, and right-padded with spaces. + doc: A regular (or "short") member name, stored directly in the name field. + long_member_name: + seq: + - id: slash + contents: "/" + - id: offset + type: space_padded_number(15, 10) + doc: The byte offset in the long name list at which the actual member name is stored. + instances: + name: + io: _root.long_name_list.data_internal._io + pos: offset.value + terminator: 0x2f + doc: The member name stored in the long name list, terminated by a slash. + doc: A long member name, stored as a reference into the long name list. + special_member_name: + seq: + - id: name + terminator: 0x20 + pad-right: 0x20 + doc: The member name, as a byte array, right-padded using ASCII spaces. + doc: A "special" member name that does not follow the usual format. This kind of name is used for special members that do not represent a normal file, such as the symbol table (named "/", or on 64-bit Solaris "/SYM64/") and the long name list (named "//"). + member_name: + seq: + - id: raw + size: 16 + doc: The name of the archive member as a 16-byte array, including any padding spaces at the end. + instances: + ascii_zero: + value: 0x30 + ascii_nine: + value: 0x39 + first_char: + pos: 0 + type: u1 + second_char: + pos: 1 + type: u1 + is_regular: + value: first_char != 0x2f + is_long: + value: first_char == 0x2f and second_char >= ascii_zero and second_char <= ascii_nine + parsed: + pos: 0 + type: + switch-on: 'is_regular ? 0 : is_long ? 1 : 2' + cases: + 0: regular_member_name + 1: long_member_name + 2: special_member_name + member_data: + seq: + - id: data + size-eos: true + doc: Dummy type representing a member's data. This type is used instead of a normal byte array to allow "looking into" it using instances (this is needed to handle long member names). + member: + seq: + - id: name_internal + -orig-id: ar_name + size: 16 + type: member_name + doc: Internal helper field, do not use directly, use the `name` instance instead. + - id: metadata + type: member_metadata + doc: The member's metadata (timestamp, user and group ID, mode). + - id: size_raw + -orig-id: ar_size + type: space_padded_number(10, 10) + doc: Raw version of size. + - id: header_terminator + -orig-id: ar_fmag + contents: "`\n" + doc: Marks the end of the header. + - id: data_internal + type: member_data + size: size + doc: Internal helper field, do not use directly, use the `data` instance instead. + - id: padding + contents: "\n" + if: size % 2 != 0 + doc: An extra newline is added as padding after members with an odd data size. This ensures that all members are 2-byte-aligned. + instances: + name: + value: | + name_internal.is_regular ? name_internal.parsed.as.name + : name_internal.is_long ? name_internal.parsed.as.name + : name_internal.parsed.as.name + doc: | + The name of the archive member. Because the encoding of member names varies across systems, the name is exposed as a byte array. + + Names are usually unique within an archive, but this is not required - the `ar` command even provides various options to work with archives containing multiple identically named members. + size: + value: size_raw.value + doc: The size of the member's data. The trailing padding byte (if any) does not count toward the data size. + data: + value: data_internal.data + doc: The member's data. + doc: An archive member's header and data. diff --git a/archive/ar/member_metadata.ksy b/archive/ar/member_metadata.ksy new file mode 100644 index 000000000..2df75ec03 --- /dev/null +++ b/archive/ar/member_metadata.ksy @@ -0,0 +1,47 @@ +meta: + id: member_metadata + title: Unix ar archive member metadata + license: CC0-1.0 + imports: + - space_padded_number +doc: | + An archive member's metadata (timestamp, user and group ID, mode). + + Modern ar implementations support adding archive members in a reproducible mode: the original file's metadata is ignored, the timestamp, UID and GID are set to 0, and the mode to 644 (octal). This mode is usually enabled by default and must be explicitly disabled to store the real file metadata in the archive. + + Rarely, all fields in the metadata may be blank (only spaces). This is the case in particular for the '//' member (the long name list) of SysV archives. +seq: + - id: modified_timestamp_raw + -orig-id: ar_date + type: space_padded_number(12, 10) + doc: Unparsed version of modified_timestamp. + - id: user_id_raw + -orig-id: ar_uid + type: space_padded_number(6, 10) + doc: Unparsed version of user_id. + - id: group_id_raw + -orig-id: ar_gid + type: space_padded_number(6, 10) + doc: Unparsed version of group_id. + - id: mode_raw + -orig-id: ar_mode + type: space_padded_number(8, 8) + doc: Unparsed version of mode. (This number is stored in octal, unlike all other fields.) +instances: + modified_timestamp: + value: modified_timestamp_raw.value + doc: The member's modification time, as a Unix timestamp. + user_id: + value: user_id_raw.value + doc: The member's owner user ID. + group_id: + value: group_id_raw.value + doc: The member's owner group ID. + mode: + value: mode_raw.value + doc: | + The member's mode bits (file type and permissions). + + In practice, archive members are always regular files (file type S_IFREG). Implementations of the ar tool generally do not add non-regular files to archives - such files will either be rejected (e. g. directories) or be treated as regular files (e. g. symlinks). Technically, the ar format does not prohibit members with non-regular file type bits, but such members have no agreed format or semantics. + + Archive members added in reproducible mode will have their mode set to 644 (octal). Note that in this case the file type bits are all zeroes, unlike in non-reproducible mode where the file type is explicitly S_IFREG. Both cases represent regular files and should be considered equivalent. diff --git a/archive/ar/space_padded_number.ksy b/archive/ar/space_padded_number.ksy new file mode 100644 index 000000000..844318f15 --- /dev/null +++ b/archive/ar/space_padded_number.ksy @@ -0,0 +1,25 @@ +meta: + id: space_padded_number + title: Fixed-size ASCII number field + license: CC0-1.0 + encoding: ASCII +doc: A number that is stored as ASCII text in a fixed-size field, padded using spaces. +params: + - id: size + type: u1 + doc: The (maximum) size of the field, in bytes. + - id: base + type: u1 + doc: The base of the number stored in the field (usually 10). +seq: + - id: text + size: size + type: str + terminator: 0x20 + pad-right: 0x20 + doc: The number in text form, right-padded with spaces. +instances: + value: + value: text.to_i(base) + if: text != "" + doc: The number, parsed as an integer. If the field is blank (all spaces), this instance is null. All other non-numeric contents are an error.