Implement simulator #873

senier · 2021-12-09T11:58:24Z

Background

Goals:

Replace PyRFLX
From a RecordFlux specification
- Parse messages easily
- Generate messages
- Run sessions
Pythonic / natural implementation
- Strict checking of invariants nonetheless
- Suitable for typed Python
- All values that can be calculated are set automatically
- Values that are set automatically cannot be set manually
- Code should have no style issues (e.g. pylint)

Related Work - Message Parsing / Generation

https://github.com/Componolit/rflx_simulator_experiments

Examples

(tests/data/specs/tlv.rflx)

package TLV is

   type Tag is (Msg_Data => 1, Msg_Error => 3) with Size => 8;
   type Length is range 0 .. 2 ** 16 - 1 with Size => 16;

   type Message is
      message
         Tag : Tag
            then Length
               if Tag = Msg_Data
            then null
               if Tag = Msg_Error;
         Length : Length
            then Value
               with Size => Length * 8;
         Value : Opaque;
      end message;
end TLV;

Test Data - Msg_Data

01 00 02 de ad

Test Data - Msg_Error

Test Data - Invalid Tag

Construct

https://github.com/construct/construct

Specification

CONSTRUCT_TLV = construct.Struct(
   "tag" / construct.Enum(construct.Int8ub, MSG_DATA=1, MSG_ERROR=2),
   construct.StopIf(this.tag != 1),
   "length" / construct.Int16ub,
   "value" / construct.Bytes(this.length)
)

This is probably wrong - it does not really make "length" and "value" optional as I would have expected. The project does have significant documentation, but it often does not cover the more complicated cases.

Parsing

result = CONSTRUCT_TLV.parse(b"\x01\x00\x02\xde\xad")
assert result.tag == CONSTRUCT_TLV.tag.MSG_DATA

# The specification does not parse correctly!
# assert result.length == 2
# assert result.value == b"\xde\xad"

Generation

assert (
   CONSTRUCT_TLV.build({"tag": 1, "length": 2, "value": b"\xde\xad"}
   == b"\x01\x00\x02\xde\xad
)

Hachoir

https://github.com/vstinner/hachoir

Specification

class HachoirTLV(Parser):
   tag_types = {
      1: "Msg_Data",
      3: "Msg_Error"
   }

   endian = hachoir.stream.BIG_ENDIAN

   def createFields(self):
      yield hachoir.field.Enum(
         hachoir.field.UInt8(self, "tag", "Tag"), self.tag_types
   )

   if self["tag"].value == 1:
      yield hachoir.field.UInt16(self, "length", "Length")
      yield hachoir.field.Bytes(self, "value", self["length"].value)

Parsing

tlv = HachoirTLV(hachoir.stream.StringInputStream(TEST_DATA_DATA))
assert tlv["tag"].value == 1
assert tlv["length"].value == 2
assert tlv["value"].value == b'\xde\xad'

Generation

Not supported (there is an editor module to change parsed data, though).

Kaitai Struct

https://kaitai.io/

Specification

meta:
   id: tlv
   endian: be
seq:
   - id: tag
     type: u1
     enum: tag
   - id: len_value
     type: u2
     if: tag == tag::data
   - id: value
     size: len_value
     if: tag == tag::data
enums:
   tag:
     1: data
     3: error

The specification is translated to Python code using the Kaitai struct compiler (ksc):

$ ksc --target python tlv.ksy

The resulting tlv.py file contains the parser. A support library (kaitaistruct) is required for it to work.

Parsing

tlv = kaitai.Tlv.from_bytes(TEST_DATA_DATA)
assert tlv.tag == tlv.tag.data
assert tlv.len_value == 2
assert tlv.value == b'\xde\xad'

Generation

Not supported

Python Suitcase

https://github.com/digidotcom/python-suitcase

Specification

class SuitcaseTLV(Structure):
   tag = suitcase.fields.UBInt8()
   length = suitcase.fields.ConditionalField(
      suitcase.fields.LengthField(suitcase.fields.UBInt16()),
      lambda m: m.tag == 1
   )
   value = suitcase.fields.ConditionalField(
      suitcase.fields.Payload(length),
      lambda m: m.tag == 1
   )

Parsing

tlv = SuitcaseTLV()
tlv.unpack(TEST_DATA_DATA)
assert tlv.tag == 1
assert tlv.length == 2
assert tlv.value == b'\xde\xad'

Generation

tlv = SuitcaseTLV()
tlv.tag = 1

# Length field is calculated automatically
# When trying to set it, we get:
# suitcase.exceptions.SuitcaseProgrammingError:
# Cannot set the value of a LengthField
# tlv.length = 2

tlv.value = b'\xde\xad'
assert tlv.pack() == TEST_DATA_DATA

Scapy

https://scapy.net/

Specification

class ScapyTLV(scapy.Packet):
    fields_desc = [
        scapy.ByteEnumField(
            "tag",
            0,
            {
                 1: "DATA",
                 2: "ERROR"
            }
        ),
        scapy.ConditionalField(
            scapy.FieldLenField("len", None, length_of="value"),
            lambda pkt: pkt.tag == 1
        ),
        scapy.ConditionalField(
            scapy.StrLenField(
                "Value",
                "",
                length_from=lambda pkt: pkt.len
            ),
            lambda pkt: pkt.tag == 1
       )
    ]

Parsing

result = ScapyTLV(TEST_DATA_DATA)
assert result.tag == 1
assert result.len == 2
assert result.value == b'\xde\xad'

Generation

result = ScapyTLV(tag = 1, value = b'\xde\xad')
assert scapy.raw(result) == TEST_DATA_DATA

Message Parser Design

Option 1.1

simulator = rflx.Simulator("tlv.rflx").tlv.message
message = simulator.parse(data)

Pro: Natural interface
Con: Static typing may be impossible / hard

Option 1.2

message = rflx.Simulator("tlv.rflx", ["TLV", "Message"], data)

Pro: Short, easier / more natural to be used programmatically
Con: Generation is asymmetric, no reuse of same model with different data

Option 1.3

message = rflx.Simulator("tlv.rflx").tlv.message
message.unpack(data)

Pro: Symmetric interface possible, reuse of previously parsed message
Con: Stateful

Option 1.4

simulator = rflx.Simulator("tlv.rflx")["TLV"]["Message"]
message = simulator.parse(data)

Pro: easier / more natural to be used programmatically
Con: Static typing may be impossible / hard

Option 1.5

simulator = rflx.Simulator("tlv.rflx")
message = simulator.tlv.message(checksum=lambda x: crc(x))
message.parse(data)

Pro: Place where checksum functions (and later parameters) are passed is consistent with specification, package hierarchy in the spec is mirrored by the code
Con: Checksum needs to passed whenever message is constructed

Option 1.6

@rflx.simulator.from_file("tlv.rflx")
class MySimulator(rflx.simulator.Simulator):
    def tlv_message_checksum(x: int) -> int:
        return crc(x)

simulator = MySimulator()
simulator.tlv.message.parse(data)

Alternative version with inline specification:

@rflx.simulator.from_string(
"""
package TLV is

   type Tag is (Msg_Data => 1, Msg_Error => 3) with Size => 8;
   type Length is range 0 .. 2 ** 16 - 1 with Size => 16;

   type Message is
      message
         Tag : Tag
            then Length
               if Tag = Msg_Data
            then null
               if Tag = Msg_Error;
         Length : Length
            then Value
               with Size => Length * 8;
         Value : Opaque;
      end message;
end TLV;
"""
)
class MySimulator(rflx.simulator.Simulator):
    def tlv_message_checksum(x: int) -> int:
        return crc(x)

Pro: Can be statically type-checked by mypy, central place for checksum functions, different child classes associated with different specs
Con: Name mangling of checksum function may become confusing, more code

Conclusion

~~Implement 1.5~~ (mypy does not provide hooks necessary to check this version)
Implement 1.6

Data Getter Design

Option 2.1

tag = message.tag
value = message.value

Pro: Better readability, natural to use
Con: translation from RecordFlux names to Python necessary to avoid style check issues

Option 2.2

tag = message["Tag"]
value = message["Value"]

Pro: Field names identical to spec, iteration over fields could be implemented on top
Con: Verbose

Option 2.3

tag = message.get("Tag")
value = message.get("Value")

Pro: Field names identical to spec
Con: Verbose

Conclusion

Implement 2.1

Data Setter Design

Option 3.1

message.tag = tag
message.value = value

Pro: Better readability, natural to use
Con: translation from RecordFlux names to Python necessary to avoid style check issues

Option 3.2

message["Tag"] = tag
message["Value"] = value

Pro: Field names identical to spec, iteration over fields could be implemented on top
Con: Verbose, calls need to be in right order

Option 3.3

message.set ("Tag", tag)
message.set ("Value", value)

Pro: Field names identical to spec
Con: Verbose, calls need to be in right order

Option 3.4

message.set ({ "Tag": tag, "Value": value })

Pro: Field names identical to spec, great flexibility
Con: Partial update may not be possible

Option 3.5

~~message = { "Tag": value }~~

Assignment cannot be overloaded in Python

Conclusion

Implement 3.1

Message Serializer Design

Option 4.1

data = message.serialize()

Pro: Natural interface
Con:

Option 4.2

data = bytes(message)

Pro: Natural interface, very pythonic
Con:

Conclusion

4.2

Checksum Design

Option 5.1

message.checksum = lambda x: crc(x)

Pro: Natural interface
Con: Must be set per message, not suitable for parsing

Option 5.2

simulator = rflx.Simulator(
                "Tlv.rflx",
                checksums={
                    'TLV': {
                         'Message': {
                               'checksum': lambda x: crc(x)
                         }
                    }
                }
            ).tlv.message

message = simulator.parse(data)

Pro: Checksum only set per simulator instance
Con: TLV.Message addressed in two distinct places / ways

Option 5.3

Cf. 1.5

Conclusion

5.3

Enumeration Literals Design

Option 6.1

if message.tag == 3:
   pass

Pro: Compatible with int
Con: Error prone, user needs to perform mapping from enum to integer manually

Option 6.2

if message.tag == simulator.tlv.msg_error:
   pass

Pro: Compatible with int (when based on IntEnum)
Con:

Conclusion

6.2

Summary - Message Parser

@rflx.simulator.from_file("tlv.rflx")
class MySimulator(rflx.simulator.Simulator):
    def tlv_message_checksum(x: int) -> int:
        return crc(x)

simulator = MySimulator()
simulator.tlv.message.parse(data)

tag = simulator.tlv.message.tag
value = simulator.tlv.message.value

if tag == simulator.tlv.msg_error:
   simulator.tlv.message.tag = new_tag

socket.send(bytes(simulator.tlv.message))

Related Work - State Machines

Pysmlib

https://darcato.github.io/pysmlib/docs/html/index.html

FiniteStateMachines

https://github.com/jaypantone/FiniteStateMachines

PythonStateMachine

https://python-statemachine.readthedocs.io/en/latest/index.html

Transitions

https://github.com/pytransitions/transitions

PySM

https://pysm.readthedocs.io/en/latest/#

StateEngine

https://github.com/aymanimtyaz/StateEngine

The text was updated successfully, but these errors were encountered:

senier added architectural decision Discussion of design decision simulator Related to simulator package (Python API) topic Large feature/change labels Dec 9, 2021

senier self-assigned this Dec 9, 2021

senier removed their assignment Jan 13, 2022

senier mentioned this issue Aug 30, 2022

Allow setting a refined opaque field directly #505

Closed

This was referenced Nov 29, 2022

Refactor PyRFLX #422

Closed

Session support in PyRFLX #293

Closed

senier self-assigned this Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement simulator #873

Implement simulator #873

senier commented Dec 9, 2021 •

edited

Loading

Implement simulator #873

Implement simulator #873

Comments

senier commented Dec 9, 2021 • edited Loading

Background

Related Work - Message Parsing / Generation

Examples

Test Data - Msg_Data

Test Data - Msg_Error

Test Data - Invalid Tag

Construct

Specification

Parsing

Generation

Hachoir

Specification

Parsing

Generation

Kaitai Struct

Specification

Parsing

Generation

Python Suitcase

Specification

Parsing

Generation

Scapy

Specification

Parsing

Generation

Message Parser Design

Option 1.1

Option 1.2

Option 1.3

Option 1.4

Option 1.5

Option 1.6

Conclusion

Data Getter Design

Option 2.1

Option 2.2

Option 2.3

Conclusion

Data Setter Design

Option 3.1

Option 3.2

Option 3.3

Option 3.4

Option 3.5

Conclusion

Message Serializer Design

Option 4.1

Conclusion

Checksum Design

Option 5.1

Option 5.2

Option 5.3

Conclusion

Enumeration Literals Design

Option 6.1

Option 6.2

Conclusion

Summary - Message Parser

Related Work - State Machines

Pysmlib

FiniteStateMachines

PythonStateMachine

Transitions

PySM

StateEngine

senier commented Dec 9, 2021 •

edited

Loading