Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the different variants of the Unix ar format #126

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
93ce39f
Add specs for Unix ar archives (generic and System V variants)
dgelessus Mar 4, 2019
0ea56cf
Add BSD variant of the Unix ar format
dgelessus Mar 6, 2019
fde4630
Clean up some minor things in generic and SysV ar specs
dgelessus Mar 6, 2019
09b55c1
Remove unused types from ar_bsd
dgelessus Mar 7, 2019
6f95f82
Replace member_name type switch with separate conditional instances
dgelessus Mar 7, 2019
b5ca335
Remove some leftover copy-paste junk
dgelessus Mar 7, 2019
afc04b5
Group ar_bsd name kinds into separate types
dgelessus Mar 7, 2019
a4ccf52
Adjust some docs in ar_bsd and ar_sysv that were copied from ar_generic
dgelessus Mar 7, 2019
36e880f
Add GNU binutils thin ar archive format
dgelessus Mar 7, 2019
9827ceb
Move ar specs into their own subdirectory
dgelessus Mar 8, 2019
e3d826b
Change ar format descriptions to be less redundant
dgelessus Mar 9, 2019
585accd
Refactor common ar structures into their own shared KSY files
dgelessus Mar 11, 2019
6bcfec4
Fix missing import in ar/member_metadata.ksy
dgelessus Mar 11, 2019
4ed99d5
Refactor member name parsing to use switch-on again
dgelessus Mar 11, 2019
6b2a713
Add deb and udeb extensions to ar specs
dgelessus Mar 14, 2019
a3aa9e1
Add convenience instances for ar space_padded_number fields
dgelessus Dec 15, 2019
8e108af
Fix a typo in archive/ar/ar_gnu_thin.ksy
dgelessus Dec 15, 2019
af88c45
Move explanation of ar metadata fields into member_metadata.ksy
dgelessus Dec 15, 2019
6bb4a3d
Expand documentation in ar space_padded_number and member_metadata
dgelessus Dec 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions archive/ar/ar_bsd.ksy
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
meta:
id: ar_bsd
title: Unix ar archive (BSD/Darwin variant)
application: ar
file-extension:
- a # Unix/generic
- rlib # Rust
- deb # Debian binary package
- udeb # Debian binary package
xref:
justsolve: AR
mime: application/x-archive
wikidata: Q300839
license: CC0-1.0
imports:
- member_metadata
- space_padded_number
doc: |
The BSD variant of the Unix ar archive format (see the `ar_generic` spec for general info about the ar format). This variant is also used on Darwin-based systems (mainly Apple's macOS and iOS).

BSD archives support member names that contain spaces or are longer than 16 bytes by storing the name as part of the member data rather than in the fixed-size name field.
doc-ref: |
https://en.wikipedia.org/w/index.php?title=Ar_(Unix)&oldid=880452895#File_format_details
https://docs.oracle.com/cd/E36784_01/html/E36873/ar.h-3head.html
https://llvm.org/docs/CommandGuide/llvm-ar.html#file-format
https://github.com/llvm/llvm-project/blob/llvmorg-7.0.1/llvm/lib/Object/Archive.cpp
seq:
- id: magic
-orig-id: ARMAG
contents: "!<arch>\n"
doc: Magic number.
- id: members
type: member
repeat: eos
doc: List of archive members. May be empty.
types:
regular_member_name:
seq:
- id: name
terminator: 0x20
pad-right: 0x20
doc: The member name, right-padded with spaces.
doc: |
A regular (or "short") member name, stored directly in the name field.

Note: Since regular names in BSD archives are terminated using spaces, file names that contain spaces cannot be stored as regular names. Such names must be stored as long names, even if they are not longer than 16 bytes.
long_member_name:
seq:
- id: magic
contents: '#1/'
doc: Indicates a long member name.
- id: name_size
type: space_padded_number(13, 10)
doc: The size of the long member name in bytes.
doc: A long member name, stored at the start of the member's data.
member_name:
seq:
- id: first_three_bytes
size: long_name_magic.length
doc: Internal helper field, do not use.
instances:
long_name_magic:
value: '[0x23, 0x31, 0x2f]'
doc: The ASCII bytes "#1/", indicating a long member name.
is_long:
value: first_three_bytes == long_name_magic
doc: Whether this is a reference to a long name (stored at the start of the archive data) or a regular name.
parsed:
pos: 0
type:
switch-on: is_long
cases:
true: long_member_name
false: regular_member_name
member:
seq:
- id: name_internal
-orig-id: ar_name
size: 16
type: member_name
doc: Internal helper field, do not use directly, use the `name` instance instead.
- id: metadata
type: member_metadata
doc: The member's metadata (timestamp, user and group ID, mode).
- id: size_raw
-orig-id: ar_size
size: 10
type: space_padded_number(10, 10)
doc: Raw version of size_with_long_name.
- id: header_terminator
-orig-id: ar_fmag
contents: "`\n"
doc: Marks the end of the header.
- id: long_name
size: name_internal.parsed.as<long_member_name>.name_size.value
terminator: 0x00
pad-right: 0x00
if: name_internal.is_long
doc: The member's long name, if any, possibly right-padded with null bytes.
- id: data
size: size
doc: The member's data.
- id: padding
contents: "\n"
if: size_with_long_name % 2 != 0
doc: An extra newline is added as padding after members with an odd data size. This ensures that all members are 2-byte-aligned.
instances:
name:
value: 'name_internal.is_long ? long_name : name_internal.parsed.as<regular_member_name>.name'
doc: |
The name of the archive member. Because the encoding of member names varies across systems, the name is exposed as a byte array.

Names are usually unique within an archive, but this is not required - the `ar` command even provides various options to work with archives containing multiple identically named members.
size_with_long_name:
value: size_raw.value
doc: The size of the member's data. The long member name (if any) counts toward this size value, but the trailing padding byte (if any) does not.
size:
value: 'name_internal.is_long ? size_with_long_name - name_internal.parsed.as<long_member_name>.name_size.value : size_with_long_name'
doc: The size of the member's data, excluding any long member name.
doc: An archive member's header and data.
73 changes: 73 additions & 0 deletions archive/ar/ar_generic.ksy
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
meta:
id: ar_generic
title: Unix ar archive (generic superset)
application: ar
file-extension:
- a # Unix/generic
- lib # Windows
- rlib # Rust
- deb # Debian binary package
- udeb # Debian binary package
xref:
justsolve: AR
mime: application/x-archive
wikidata: Q300839
license: CC0-1.0
imports:
- member_metadata
- space_padded_number
# The ar format is somewhat unusual: although it can store arbitrary data files, the ar format itself is text-based - all fields and magic numbers are pure ASCII.
# In particular, numerical values are stored as ASCII-encoded decimal and octal numbers, rather than packed byte values. Because of this, the ar format has no endianness.
# No string encoding is specified either. As different systems use different encodings, all text (i. e. file names) are exposed as byte arrays.
doc: |
The Unix ar archive format, as created by the `ar` utility. It is a simple uncompressed flat archive format, but is rarely used for general-purpose archiving. Instead, it is commonly used by linkers to collect multiple object files along with a symbol table into a static library. The Debian package format (.deb) is also based on the ar format.

The ar format is not standardized and several variants have been developed, which differ mainly in how member names and the symbol table (if any) are stored. This specification describes the basic structure shared by all ar variants.
doc-ref: |
https://en.wikipedia.org/w/index.php?title=Ar_(Unix)&oldid=880452895#File_format_details
https://docs.oracle.com/cd/E36784_01/html/E36873/ar.h-3head.html
https://llvm.org/docs/CommandGuide/llvm-ar.html#file-format
https://github.com/llvm/llvm-project/blob/llvmorg-7.0.1/llvm/lib/Object/Archive.cpp
seq:
- id: magic
-orig-id: ARMAG
contents: "!<arch>\n"
doc: Magic number.
- id: members
type: member
repeat: eos
doc: List of archive members. May be empty.
types:
member:
seq:
- id: name
-orig-id: ar_name
size: 16
# We don't set a terminator for the name field, because different ar format variants use different terminators (see doc).
doc: |
The name of the archive member, right-padded with spaces. Because the exact format of this field differs between format variants, it is exposed as a fixed-size byte array. Long member names are not processed, and no terminator or padding characters are removed. To read member names correctly from an archive whose format variant is known, use the `ar_bsd` or `ar_sysv` specification.

Names are usually unique within an archive, but this is not required - the `ar` command even provides various options to work with archives containing multiple identically named members.
- id: metadata
type: member_metadata
doc: The member's metadata (timestamp, user and group ID, mode).
- id: size_raw
-orig-id: ar_size
type: space_padded_number(10, 10)
doc: Raw version of size.
- id: header_terminator
-orig-id: ar_fmag
contents: "`\n"
doc: Marks the end of the header.
- id: data
size: size
doc: The member's data.
- id: padding
contents: "\n"
if: size % 2 != 0
doc: An extra newline is added as padding after members with an odd data size. This ensures that all members are 2-byte-aligned.
instances:
size:
value: size_raw.value
doc: The size of the member's data. The trailing padding byte (if any) does not count toward the data size.
doc: An archive member's header and data.
136 changes: 136 additions & 0 deletions archive/ar/ar_gnu_thin.ksy
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
meta:
id: ar_gnu_thin
title: GNU binutils thin ar archive
application: ar
file-extension:
- a
license: CC0-1.0
imports:
- space_padded_number
- member_metadata
doc: |
The thin ar archive format, as created by the GNU binutils `ar` utility using the `T` flag. Thin archives are used by GNU binutils as a more efficient format for locally-created static libraries than the regular ar format. Thin archives only store the paths of all contained files (relative to the archive), but not the files' actual data - to read data from the archive, the original files need to be looked up and read. This makes thin archives unsuitable for general-purpose archiving (in fact, GNU `ar` does not support manually extracting thin archives), they are only meant to be used as a static library format.

The internal structure of thin archives is very similar to regular System V/GNU ar archives, but the formats are not compatible.
doc-ref: https://sourceware.org/binutils/docs/binutils/ar.html
seq:
- id: magic
-orig-id: ARMAG
contents: "!<thin>\n"
doc: Magic number.
- id: members
type: member
repeat: eos
doc: List of archive members. May be empty.
instances:
long_name_list_name:
value: '[0x2f, 0x2f, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20]'
doc: The name of the special "long name list" member. This is a byte array containing "//" (two slashes) right-padded using 14 spaces (in ASCII).
long_name_list_index:
value: |
members.size > 0 and members[0].name_internal.raw == long_name_list_name ? 0
: members.size > 1 and members[1].name_internal.raw == long_name_list_name ? 1
: -1
doc: |
The index of the special "long name list" member in the members array, or `-1` if this archive doesn't contain a long name list.

Note: the long name list is only recognized if it is one of the first two archive members. This is because it it always appears immediately after the symbol table (or if there is no symbol table, at the very beginning of the archive).
long_name_list:
value: members[long_name_list_index]
if: long_name_list_index != -1
doc: A special archive member that holds a list of long names used by other archive members. (Optional, only present if the archive has members with long names.)
types:
long_member_name:
seq:
- id: slash
contents: "/"
- id: offset
type: space_padded_number(15, 10)
doc: The byte offset in the long name list at which the actual member name is stored.
instances:
name:
io: _root.long_name_list.data_internal._io
pos: offset.value
# The terminator is actually a slash followed by a newline, but multi-character terminators are not supported by Kaitai, and it's very unlikely that a path will contain a newline.
terminator: 0x0a
doc: The member name (actually a relative path) stored in the long name list, terminated by a slash and a newline. For technical reasons, includes the terminating slash (but not the newline).
doc: A long member name (actually a relative path), stored as a reference into the long name list.
special_member_name:
seq:
- id: name
terminator: 0x20
pad-right: 0x20
doc: The member name, as a byte array, right-padded using ASCII spaces.
doc: A "special" member name that does not follow the usual format. This kind of name is used for special members that do not represent a normal file, such as the symbol table (named "/") and the long name list (named "//").
member_name:
seq:
- id: raw
size: 16
doc: The name of the archive member as a 16-byte array, including any padding spaces at the end.
instances:
ascii_zero:
value: 0x30
ascii_nine:
value: 0x39
first_char:
pos: 0
type: u1
second_char:
pos: 1
type: u1
is_long:
value: first_char == 0x2f and second_char >= ascii_zero and second_char <= ascii_nine
parsed:
pos: 0
type:
switch-on: is_long
cases:
true: long_member_name
false: special_member_name
member_data:
seq:
- id: data
size-eos: true
doc: Dummy type representing a member's data. This type is used instead of a normal byte array to allow "looking into" it using instances (this is needed to handle long member names).
member:
seq:
- id: name_internal
-orig-id: ar_name
size: 16
type: member_name
doc: Internal helper field, do not use directly, use the `name` instance instead.
- id: metadata
type: member_metadata
doc: The member's metadata (timestamp, user and group ID, mode).
- id: size_raw
-orig-id: ar_size
type: space_padded_number(10, 10)
doc: Raw version of size.
- id: header_terminator
-orig-id: ar_fmag
contents: "`\n"
doc: Marks the end of the header.
- id: data_internal
type: member_data
size: size
if: not name_internal.is_long
doc: Internal helper field, do not use directly, use the `data` instance instead.
- id: padding
contents: "\n"
if: not name_internal.is_long and size % 2 != 0
doc: An extra newline is added as padding after members with an odd data size. This ensures that all members are 2-byte-aligned.
instances:
name:
value: 'name_internal.is_long ? name_internal.parsed.as<long_member_name>.name : name_internal.parsed.as<special_member_name>.name'
doc: |
The name of the archive member. Because the encoding of member names varies across systems, the name is exposed as a byte array.

Names are usually unique within an archive, but this is not required - the `ar` command even provides various options to work with archives containing multiple identically named members.
size:
value: size_raw.value
doc: The size of the member's data. The trailing padding byte (if any) does not count toward the data size.
data:
value: data_internal.data
if: not name_internal.is_long
doc: The member's data. Only present for special members.
doc: An archive member's header and data.
Loading