-
Notifications
You must be signed in to change notification settings - Fork 54
PrimitiveTypes
BinData provides support for the most commonly used primitive types that are used when working with binary data. Namely:
- length based strings
- zero terminated strings
- byte based integers - signed or unsigned, big or little endian and of any size
- bit based integers - unsigned big or little endian integers of any size
- floating point numbers - single or double precision floats in either big or little endian
Primitives may be manipulated individually, but is more common to work with them as part of a record.
Examples of individual usage:
int16 = BinData::Int16be.new(941)
int16.to_binary_s #=> "\003\255"
fl = BinData::FloatBe.read("\100\055\370\124") #=> 2.71828174591064
fl.num_bytes #=> 4
fl * int16 #=> 2557.90320057996
There are several parameters that are specific to all primitives.
This contains the initial value that the primitive will contain after initialization. This is useful for setting default values.
obj = BinData::String.new(:initial_value => "hello ")
obj + "world" #=> "hello world"
obj.assign("good-bye " )
obj + "world" #=> "good-bye world"
The primitive will always contain this value. Reading or assigning will not change the value. This parameter is used to define constants or dependent fields.
pi = BinData::FloatLe.new(:value => Math::PI)
pi.assign(3)
puts pi #=> 3.14159265358979
class IntList < BinData::Record
uint8 :len, :value => lambda { data.length }
array :data, :type => :uint32be
end
list = IntList.new([1, 2, 3])
list.len #=> 3
When reading or assigning, will raise a ValidityError
if the value read does
not match the value of this parameter.
obj = BinData::String.new(:assert => lambda { /aaa/ =~ value })
obj.read("baaa!") #=> "baaa!"
obj.read("bbb") #=> raises ValidityError
obj = BinData::String.new(:assert => "foo")
obj.read("foo") #=> "foo"
obj.assign("bar") #=> raises ValidityError
A combination of :assert
and :value
. Used as a shortcut when
both :assert
and :value
have the same values. The following
are logically equivalent.
obj = BinData::Uint32Be.new(:assert => 42, :value => 42)
obj = BinData::Uint32Be.new(:asserted_value => 42)
There are three kinds of numeric types that are supported by BinData.
These are the common integers that are used in most low level programming languages (C, C++, Java etc). These integers can be signed or unsigned. The endian must be specified so that the conversion is independent of architecture. The bit size of these integers must be a multiple of 8. Examples of byte based integers are:
uint16be
: unsigned 16 bit big endian integer
int8
: signed 8 bit integer
int32le
: signed 32 bit little endian integer
uint40be
: unsigned 40 bit big endian integer
The be
| le
suffix may be omitted if the endian
keyword is in use.
These integers are used to define bitfields in records. Bitfields default to unsigned and big endian, but signed and little endian may be specified explicitly. Little endian bitfields are rare, but do occur in older file formats (e.g. The file allocation table for FAT12 filesystems is stored as an array of 12bit little endian integers).
An array of bit based integers will be packed according to their endian.
In a record, adjacent bitfields will be packed according to their endian. All other fields are byte-aligned.
Examples of bit based integers are:
bit1
: 1 bit big endian integer (may be used as boolean, see below.)
bit4_le
: 4 bit little endian integer
sbit4_le
: 4 bit signed little endian integer
bit32
: 32 bit big endian integer
sbit32
: 32 bit signed big endian integer
The difference between byte and bit based integers of the same number of
bits (e.g. uint8
vs bit8
) is one of alignment.
This example is packed as 3 bytes
class A < BinData::Record
bit4 :a
uint8 :b
bit4 :c
end
Data is stored as: AAAA0000 BBBBBBBB CCCC0000
Whereas this example is packed into only 2 bytes
class B < BinData::Record
bit4 :a
bit8 :b
bit4 :c
end
Data is stored as: AAAABBBB BBBBCCCC
The number of bits in a bit based integer can be declared dynamically with the
:nbits
parameter. Bit based integers exist for all four combinations of
signed and endian (bit
, sbit
, bit_le
, sbit_le
).
class Rectangle < BinData::Record
bit5 :bit_length
sbit :xmin, :nbits => :bit_length
sbit :xmax, :nbits => :bit_length
sbit :ymin, :nbits => :bit_length
sbit :ymax, :nbits => :bit_length
end
Bit1
can be assigned boolean values as a convenience. The resultant value will be either 0 or 1, regardless of whether the assigned value was an integer or boolean.
bit = BinData::Bit1.new
bit = true
bit.value #=> 1
bit = false
bit.value #=> 0
BinData supports 32 and 64 bit floating point numbers, in both big and little endian format. These types are:
single precision 32 bit little endian float
single precision 32 bit big endian float
double precision 64 bit little endian float
double precision 64 bit big endian float
The _be
| _le
suffix may be omitted if the endian
keyword is in use.
Here is an example declaration for an Internet Protocol network packet.
class IP_PDU < BinData::Record
endian :big
bit4 :version, :value => 4
bit4 :header_length
uint8 :tos
uint16 :total_length
uint16 :ident
bit3 :flags
bit13 :frag_offset
uint8 :ttl
uint8 :protocol
uint16 :checksum
uint32 :src_addr
uint32 :dest_addr
string :options, :read_length => :options_length_in_bytes
string :data, :read_length => lambda { total_length - header_length_in_bytes }
def header_length_in_bytes
header_length * 4
end
def options_length_in_bytes
header_length_in_bytes - 20
end
end
Three of the fields have parameters.
- The version field always has the value 4, as per the standard.
- The options field is read as a raw string, but not processed.
- The data field contains the payload of the packet. Its length is calculated as the total length of the packet minus the length of the header.
BinData supports two types of strings - explicitly sized and zero terminated. Strings are treated internally as a sequence of 8bit bytes. This is the same as strings in Ruby 1.8. BinData fully supports Ruby 1.9/2.0 string encodings. See this FAQ entry for details.
Sized strings may have a set length (in bytes). If an assigned value is shorter than this length, it will be padded to this length. If no length is set, the length is taken to be the length of the assigned value.
There are several parameters that are specific to sized strings.
The fixed length of the string. If a shorter string is set, it will be padded to this length. Longer strings will be truncated.
obj = BinData::String.new(:length => 6)
obj.read("abcdefghij")
obj #=> "abcdef"
obj = BinData::String.new(:length => 6)
obj.assign("abcd")
obj #=> "abcd\000\000"
obj = BinData::String.new(:length => 6)
obj.assign("abcdefghij")
obj #=> "abcdef"
The length in bytes to use when reading a value. This is used in the case where a string is read and then written with a possibly different length.
obj = BinData::String.new(:read_length => 5)
obj.read("abcdefghij")
obj #=> "abcde"
obj.assign("abc")
obj.write(io) #=> "abc"
:read_length
is also needed to prevent ambiguity when declaring a String with
both value and length.
The following is ambiguous. Does it read 2 or 3 bytes? Does it write 2 or 3 bytes?
obj = BinData::String.new(:value => "abc", :length => 2)
Using :read_length
prevents the ambiguity. It reads 2, but writes 3 bytes.
obj = BinData::String.new(:value => "abc", :read_length => 2)
Boolean, default false
. Signifies that the padding occurs at the front
of the string rather than the end.
obj = BinData::String.new(:length => 6, :pad_front => true)
obj.assign("abcd")
obj.snapshot #=> "\000\000abcd"
Defaults to "\0"
. The character to use when padding a string to a
set length. Valid values are Integers
and Strings
of one byte.
Multi byte padding is not supported.
obj = BinData::String.new(:length => 6, :pad_byte => 'A')
obj.assign("abcd")
obj.snapshot #=> "abcdAA"
obj.to_binary_s #=> "abcdAA"
Boolean, default false
. If set, the value of this string will
have all pad_bytes trimmed from the end of the string. The value
will not be trimmed when writing.
obj = BinData::String.new(:length => 6, :trim_padding => true)
obj.assign("abcd")
obj.snapshot #=> "abcd"
obj.to_binary_s #=> "abcd\000\000"
These strings are modeled on the C style of string - a sequence of
bytes terminated by a null ("\0"
) byte.
obj = BinData::Stringz.new
obj.read("abcd\000efgh")
obj #=> "abcd"
obj.num_bytes #=> 5
obj.to_binary_s #=> "abcd\000"
Most user defined types will be Records but occasionally we'd like to create a custom primitive type.
Let us revisit the Pascal String example.
class PascalString < BinData::Record
uint8 :len, :value => lambda { data.length }
string :data, :read_length => :len
end
We'd like to make PascalString
a user defined type that behaves like a
BinData::BasePrimitive
object so we can use :initial_value
etc.
Here's an example usage of what we'd like:
class Favourites < BinData::Record
pascal_string :language, :initial_value => "ruby"
pascal_string :os, :initial_value => "unix"
end
f = Favourites.new
f.os = "freebsd"
f.to_binary_s #=> "\004ruby\007freebsd"
We create this type of custom string by inheriting from
BinData::Primitive
(instead of BinData::Record
) and implementing the
#get
and #set
methods.
class PascalString < BinData::Primitive
uint8 :len, :value => lambda { data.length }
string :data, :read_length => :len
def get; self.data; end
def set(v) self.data = v; end
end
A user defined primitive type has both an internal (binary structure) and an external (ruby interface) representation. The internal representation is encapsulated and inaccessible from the external ruby interface.
Consider a LispBool type that uses :t
for true and nil
for false.
The binary representation is a signed byte with value 1
for true and
-1
for false.
class LispBool < BinData::Primitive
int8 :val
def get
case self.val
when 1
:t
when -1
nil
else
nil # unknown value, default to false
end
end
def set(v)
case v
when :t
self.val = 1
when nil
self.val = -1
else
self.val = -1 # unknown value, default to false
end
end
end
b = LispBool.new
b.assign(:t)
b.to_binary_s #=> "\001"
b.read("\xff")
b.snapshot #=> nil
#read
and #write
use the internal representation. #assign
and
#snapshot
use the external representation. Mixing them up will lead
to undefined behaviour.
b = LispBool.new
b.assign(1) #=> undefined. Don't do this.
Sometimes a user defined primitive type can not easily be declaratively
defined. In this case you should inherit from BinData::BasePrimitive
and implement the following three methods:
Takes a ruby value (String
, Numeric
etc) and converts it to
the appropriate binary string representation.
Reads a number of bytes from io
and returns a ruby object that
represents these bytes.
The ruby value that a clear object should return.
If you wish to access parameters from inside these methods, you can
use eval_parameter(key)
.
Here is an example of a big integer implementation.
# A custom big integer format. Binary format is:
# 1 byte : 0 for positive, non zero for negative
# x bytes : Little endian stream of 7 bit bytes representing the
# positive form of the integer. The upper bit of each byte
# is set when there are more bytes in the stream.
class BigInteger < BinData::BasePrimitive
def value_to_binary_string(value)
negative = (value < 0) ? 1 : 0
value = value.abs
bytes = [negative]
loop do
seven_bit_byte = value & 0x7f
value >>= 7
has_more = value.nonzero? ? 0x80 : 0
byte = has_more | seven_bit_byte
bytes.push(byte)
break if has_more.zero?
end
bytes.collect { |b| b.chr }.join
end
def read_and_return_value(io)
negative = read_uint8(io).nonzero?
value = 0
bit_shift = 0
loop do
byte = read_uint8(io)
has_more = byte & 0x80
seven_bit_byte = byte & 0x7f
value |= seven_bit_byte << bit_shift
bit_shift += 7
break if has_more.zero?
end
negative ? -value : value
end
def sensible_default
0
end
def read_uint8(io)
io.readbytes(1).unpack("C").at(0)
end
end