Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the different variants of the Unix ar format #126

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
93ce39f
Add specs for Unix ar archives (generic and System V variants)
dgelessus Mar 4, 2019
0ea56cf
Add BSD variant of the Unix ar format
dgelessus Mar 6, 2019
fde4630
Clean up some minor things in generic and SysV ar specs
dgelessus Mar 6, 2019
09b55c1
Remove unused types from ar_bsd
dgelessus Mar 7, 2019
6f95f82
Replace member_name type switch with separate conditional instances
dgelessus Mar 7, 2019
b5ca335
Remove some leftover copy-paste junk
dgelessus Mar 7, 2019
afc04b5
Group ar_bsd name kinds into separate types
dgelessus Mar 7, 2019
a4ccf52
Adjust some docs in ar_bsd and ar_sysv that were copied from ar_generic
dgelessus Mar 7, 2019
36e880f
Add GNU binutils thin ar archive format
dgelessus Mar 7, 2019
9827ceb
Move ar specs into their own subdirectory
dgelessus Mar 8, 2019
e3d826b
Change ar format descriptions to be less redundant
dgelessus Mar 9, 2019
585accd
Refactor common ar structures into their own shared KSY files
dgelessus Mar 11, 2019
6bcfec4
Fix missing import in ar/member_metadata.ksy
dgelessus Mar 11, 2019
4ed99d5
Refactor member name parsing to use switch-on again
dgelessus Mar 11, 2019
6b2a713
Add deb and udeb extensions to ar specs
dgelessus Mar 14, 2019
a3aa9e1
Add convenience instances for ar space_padded_number fields
dgelessus Dec 15, 2019
8e108af
Fix a typo in archive/ar/ar_gnu_thin.ksy
dgelessus Dec 15, 2019
af88c45
Move explanation of ar metadata fields into member_metadata.ksy
dgelessus Dec 15, 2019
6bb4a3d
Expand documentation in ar space_padded_number and member_metadata
dgelessus Dec 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add BSD variant of the Unix ar format
dgelessus committed Mar 7, 2019
commit 0ea56cf1a6a6f5bca475eabc74ae6b6c5a860bcc
164 changes: 164 additions & 0 deletions archive/ar_bsd.ksy
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
meta:
id: ar_bsd
title: Unix ar archive (BSD/Darwin variant)
application: ar
file-extension:
- a # Unix/generic
- rlib # Rust
xref:
justsolve: AR
mime: application/x-archive
wikidata: Q300839
license: CC0-1.0
# The ar format is somewhat unusual: although it can store arbitrary data files, the ar format itself is text-based - all fields and magic numbers are pure ASCII.
# In particular, numerical values are stored as ASCII-encoded decimal and octal numbers, rather than packed byte values. Because of this, the ar format has no endianness.
# Note: the encoding specified here is not used to interpret member names. As different systems use different encodings, they are exposed as byte arrays.
encoding: ASCII
doc: |
The Unix ar archive format, as created by the `ar` utility. It is a simple uncompressed flat archive format, but is rarely used for general-purpose archiving. Instead, it is commonly used by linkers to collect multiple object files along with a symbol table into a static library. The Debian package format (.deb) is also based on the ar format.

The ar format is not standardized and several variants have been developed, which differ mainly in how member names and the symbol table (if any) are stored. This specification describes the BSD variant, which is also used on Darwin-based systems (mainly Apple's macOS and iOS).
doc-ref: |
https://en.wikipedia.org/w/index.php?title=Ar_(Unix)&oldid=880452895#File_format_details
https://docs.oracle.com/cd/E36784_01/html/E36873/ar.h-3head.html
https://llvm.org/docs/CommandGuide/llvm-ar.html#file-format
https://github.com/llvm/llvm-project/blob/llvmorg-7.0.1/llvm/lib/Object/Archive.cpp
seq:
- id: magic
-orig-id: ARMAG
contents: "!<arch>\n"
doc: Magic number.
- id: members
type: member
repeat: eos
doc: List of archive members. May be empty.
types:
regular_member_name:
seq:
- id: name
terminator: 0x20
pad-right: 0x20
doc: The member name, right-padded with spaces.
doc: A regular (or "short") member name, stored directly in the name field.
long_member_name:
seq:
- id: magic
contents: long_name_magic
- id: size_dec
type: str
terminator: 0x20
pad-right: 0x20
doc: The size of the long member name in bytes, in ASCII decimal, right-padded with spaces.
instances:
size:
value: size_dec.to_i
doc: The size of the long member name in bytes, parsed as an integer.
member_name:
seq:
- id: first_three_bytes
size: long_name_magic.length
doc: Internal helper field, do not use.
instances:
long_name_magic:
value: '[0x23, 0x31, 0x2f]'
doc: The ASCII bytes "#1/", indicating a long member name.
is_long_name:
value: first_three_bytes == long_name_magic
doc: Whether this is a reference to a long name (stored at the start of the archive data) or a regular name.
regular_name:
pos: 0
size-eos: true
terminator: 0x20
pad-right: 0x20
doc: The regular member name, right-padded with spaces.
if: not is_long_name
long_name_size_dec:
pos: long_name_magic.length
size-eos: true
type: str
terminator: 0x20
pad-right: 0x20
doc: The size of the long member name in bytes, in ASCII decimal, right-padded with spaces.
if: is_long_name
long_name_size:
value: long_name_size_dec.to_i
doc: The size of the long member name in bytes, parsed as an integer.
if: is_long_name
member:
seq:
- id: name_internal
-orig-id: ar_name
size: 16
type: member_name
doc: Internal helper field, do not use directly, use the `name` instance instead.
- id: modified_timestamp_dec
-orig-id: ar_date
size: 12
type: str
terminator: 0x20
pad-right: 0x20
doc: The member's modification time, as a Unix timestamp, in ASCII decimal, right-padded with spaces.
- id: user_id_dec
-orig-id: ar_uid
size: 6
type: str
terminator: 0x20
pad-right: 0x20
doc: The member's user ID, in ASCII decimal, right-padded with spaces.
- id: group_id_dec
-orig-id: ar_gid
size: 6
type: str
terminator: 0x20
pad-right: 0x20
doc: The member's group ID, in ASCII decimal, right-padded with spaces.
- id: mode_oct
-orig-id: ar_mode
size: 8
type: str
terminator: 0x20
pad-right: 0x20
doc: The member's mode bits, in ASCII octal, right-padded with spaces.
- id: size_raw_dec
-orig-id: ar_size
size: 10
type: str
terminator: 0x20
pad-right: 0x20
doc: The size of the member's data, in ASCII decimal, right-padded with spaces. The long member name (if any) counts toward the data size, but the trailing padding byte (if any) does not.
- id: header_terminator
-orig-id: ar_fmag
contents: "`\n"
doc: Marks the end of the header.
- id: long_name
size: name_internal.long_name_size
terminator: 0x00
pad-right: 0x00
if: name_internal.is_long_name
doc: The member's long name, if any, possibly right-padded with null bytes.
- id: data
size: size
doc: The member's data.
- id: padding
contents: "\n"
if: size % 2 != 0
doc: An extra newline is added as padding after members with an odd data size. This ensures that all members are 2-byte-aligned.
instances:
size_raw:
value: size_raw_dec.to_i
doc: The size of the member's data, including any long member name, parsed as an integer.
name:
value: 'name_internal.is_long_name ? long_name : name_internal.regular_name'
doc: |
The name of the archive member. Because the encoding of member names varies across systems, the name is exposed as a byte array.

Names are usually unique within an archive, but this is not required - the `ar` command even provides various options to work with archives containing multiple identically named members.nce with a `name` attribute.
size:
value: 'name_internal.is_long_name ? size_raw - name_internal.long_name_size : size_raw'
doc: The size of the member's data, excluding any long member name.
doc: |
An archive member's header and data.

By default, modern ar implementations set the modification timestamp, user ID and group ID to 0 and the mode to 644 (octal), regardless of the file's original metadata, to make archive creation reproducible.

Rarely, the modification timestamp, user ID, group ID and mode fields may be blank (only spaces). This is the case in particular for the '//' member (the long name list) of SysV archives.