Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement simulator #873

Open
senier opened this issue Dec 9, 2021 · 0 comments
Open

Implement simulator #873

senier opened this issue Dec 9, 2021 · 0 comments
Assignees
Labels
architectural decision Discussion of design decision simulator Related to simulator package (Python API) topic Large feature/change

Comments

@senier
Copy link
Member

senier commented Dec 9, 2021

Background

Goals:

  • Replace PyRFLX

  • From a RecordFlux specification

    • Parse messages easily
    • Generate messages
    • Run sessions
  • Pythonic / natural implementation

    • Strict checking of invariants nonetheless
    • Suitable for typed Python
    • All values that can be calculated are set automatically
    • Values that are set automatically cannot be set manually
    • Code should have no style issues (e.g. pylint)

Related Work - Message Parsing / Generation

https://github.com/Componolit/rflx_simulator_experiments

Examples

(tests/data/specs/tlv.rflx)

package TLV is

   type Tag is (Msg_Data => 1, Msg_Error => 3) with Size => 8;
   type Length is range 0 .. 2 ** 16 - 1 with Size => 16;

   type Message is
      message
         Tag : Tag
            then Length
               if Tag = Msg_Data
            then null
               if Tag = Msg_Error;
         Length : Length
            then Value
               with Size => Length * 8;
         Value : Opaque;
      end message;
end TLV;

Test Data - Msg_Data

01 00 02 de ad

Test Data - Msg_Error

02

Test Data - Invalid Tag

03

Construct

https://github.com/construct/construct

Specification

CONSTRUCT_TLV = construct.Struct(
   "tag" / construct.Enum(construct.Int8ub, MSG_DATA=1, MSG_ERROR=2),
   construct.StopIf(this.tag != 1),
   "length" / construct.Int16ub,
   "value" / construct.Bytes(this.length)
)

This is probably wrong - it does not really make "length" and "value" optional as I would have expected. The project does have significant documentation, but it often does not cover the more complicated cases.

Parsing

result = CONSTRUCT_TLV.parse(b"\x01\x00\x02\xde\xad")
assert result.tag == CONSTRUCT_TLV.tag.MSG_DATA

# The specification does not parse correctly!
# assert result.length == 2
# assert result.value == b"\xde\xad"

Generation

assert (
   CONSTRUCT_TLV.build({"tag": 1, "length": 2, "value": b"\xde\xad"}
   == b"\x01\x00\x02\xde\xad
)

Hachoir

https://github.com/vstinner/hachoir

Specification

class HachoirTLV(Parser):
   tag_types = {
      1: "Msg_Data",
      3: "Msg_Error"
   }

   endian = hachoir.stream.BIG_ENDIAN

   def createFields(self):
      yield hachoir.field.Enum(
         hachoir.field.UInt8(self, "tag", "Tag"), self.tag_types
   )

   if self["tag"].value == 1:
      yield hachoir.field.UInt16(self, "length", "Length")
      yield hachoir.field.Bytes(self, "value", self["length"].value)

Parsing

tlv = HachoirTLV(hachoir.stream.StringInputStream(TEST_DATA_DATA))
assert tlv["tag"].value == 1
assert tlv["length"].value == 2
assert tlv["value"].value == b'\xde\xad'

Generation

Not supported (there is an editor module to change parsed data, though).

Kaitai Struct

https://kaitai.io/

Specification

meta:
   id: tlv
   endian: be
seq:
   - id: tag
     type: u1
     enum: tag
   - id: len_value
     type: u2
     if: tag == tag::data
   - id: value
     size: len_value
     if: tag == tag::data
enums:
   tag:
     1: data
     3: error

The specification is translated to Python code using the Kaitai struct compiler (ksc):

$ ksc --target python tlv.ksy

The resulting tlv.py file contains the parser. A support library (kaitaistruct) is required for it to work.

Parsing

tlv = kaitai.Tlv.from_bytes(TEST_DATA_DATA)
assert tlv.tag == tlv.tag.data
assert tlv.len_value == 2
assert tlv.value == b'\xde\xad'

Generation

Not supported

Python Suitcase

https://github.com/digidotcom/python-suitcase

Specification

class SuitcaseTLV(Structure):
   tag = suitcase.fields.UBInt8()
   length = suitcase.fields.ConditionalField(
      suitcase.fields.LengthField(suitcase.fields.UBInt16()),
      lambda m: m.tag == 1
   )
   value = suitcase.fields.ConditionalField(
      suitcase.fields.Payload(length),
      lambda m: m.tag == 1
   )

Parsing

tlv = SuitcaseTLV()
tlv.unpack(TEST_DATA_DATA)
assert tlv.tag == 1
assert tlv.length == 2
assert tlv.value == b'\xde\xad'

Generation

tlv = SuitcaseTLV()
tlv.tag = 1

# Length field is calculated automatically
# When trying to set it, we get:
# suitcase.exceptions.SuitcaseProgrammingError:
# Cannot set the value of a LengthField
# tlv.length = 2

tlv.value = b'\xde\xad'
assert tlv.pack() == TEST_DATA_DATA

Scapy

https://scapy.net/

Specification

class ScapyTLV(scapy.Packet):
    fields_desc = [
        scapy.ByteEnumField(
            "tag",
            0,
            {
                 1: "DATA",
                 2: "ERROR"
            }
        ),
        scapy.ConditionalField(
            scapy.FieldLenField("len", None, length_of="value"),
            lambda pkt: pkt.tag == 1
        ),
        scapy.ConditionalField(
            scapy.StrLenField(
                "Value",
                "",
                length_from=lambda pkt: pkt.len
            ),
            lambda pkt: pkt.tag == 1
       )
    ]

Parsing

result = ScapyTLV(TEST_DATA_DATA)
assert result.tag == 1
assert result.len == 2
assert result.value == b'\xde\xad'

Generation

result = ScapyTLV(tag = 1, value = b'\xde\xad')
assert scapy.raw(result) == TEST_DATA_DATA

Message Parser Design

Option 1.1

simulator = rflx.Simulator("tlv.rflx").tlv.message
message = simulator.parse(data)

Pro: Natural interface
Con: Static typing may be impossible / hard

Option 1.2

message = rflx.Simulator("tlv.rflx", ["TLV", "Message"], data)

Pro: Short, easier / more natural to be used programmatically
Con: Generation is asymmetric, no reuse of same model with different data

Option 1.3

message = rflx.Simulator("tlv.rflx").tlv.message
message.unpack(data)

Pro: Symmetric interface possible, reuse of previously parsed message
Con: Stateful

Option 1.4

simulator = rflx.Simulator("tlv.rflx")["TLV"]["Message"]
message = simulator.parse(data)

Pro: easier / more natural to be used programmatically
Con: Static typing may be impossible / hard

Option 1.5

simulator = rflx.Simulator("tlv.rflx")
message = simulator.tlv.message(checksum=lambda x: crc(x))
message.parse(data)

Pro: Place where checksum functions (and later parameters) are passed is consistent with specification, package hierarchy in the spec is mirrored by the code
Con: Checksum needs to passed whenever message is constructed

Option 1.6

@rflx.simulator.from_file("tlv.rflx")
class MySimulator(rflx.simulator.Simulator):
    def tlv_message_checksum(x: int) -> int:
        return crc(x)

simulator = MySimulator()
simulator.tlv.message.parse(data)

Alternative version with inline specification:

@rflx.simulator.from_string(
"""
package TLV is

   type Tag is (Msg_Data => 1, Msg_Error => 3) with Size => 8;
   type Length is range 0 .. 2 ** 16 - 1 with Size => 16;

   type Message is
      message
         Tag : Tag
            then Length
               if Tag = Msg_Data
            then null
               if Tag = Msg_Error;
         Length : Length
            then Value
               with Size => Length * 8;
         Value : Opaque;
      end message;
end TLV;
"""
)
class MySimulator(rflx.simulator.Simulator):
    def tlv_message_checksum(x: int) -> int:
        return crc(x)

Pro: Can be statically type-checked by mypy, central place for checksum functions, different child classes associated with different specs
Con: Name mangling of checksum function may become confusing, more code

Conclusion

Implement 1.5 (mypy does not provide hooks necessary to check this version)
Implement 1.6

Data Getter Design

Option 2.1

tag = message.tag
value = message.value

Pro: Better readability, natural to use
Con: translation from RecordFlux names to Python necessary to avoid style check issues

Option 2.2

tag = message["Tag"]
value = message["Value"]

Pro: Field names identical to spec, iteration over fields could be implemented on top
Con: Verbose

Option 2.3

tag = message.get("Tag")
value = message.get("Value")

Pro: Field names identical to spec
Con: Verbose

Conclusion

Implement 2.1

Data Setter Design

Option 3.1

message.tag = tag
message.value = value

Pro: Better readability, natural to use
Con: translation from RecordFlux names to Python necessary to avoid style check issues

Option 3.2

message["Tag"] = tag
message["Value"] = value

Pro: Field names identical to spec, iteration over fields could be implemented on top
Con: Verbose, calls need to be in right order

Option 3.3

message.set ("Tag", tag)
message.set ("Value", value)

Pro: Field names identical to spec
Con: Verbose, calls need to be in right order

Option 3.4

message.set ({ "Tag": tag, "Value": value })

Pro: Field names identical to spec, great flexibility
Con: Partial update may not be possible

Option 3.5

message = { "Tag": value }

Assignment cannot be overloaded in Python

Conclusion

Implement 3.1

Message Serializer Design

Option 4.1

data = message.serialize()

Pro: Natural interface
Con:

Option 4.2

data = bytes(message)

Pro: Natural interface, very pythonic
Con:

Conclusion

4.2

Checksum Design

Option 5.1

message.checksum = lambda x: crc(x)

Pro: Natural interface
Con: Must be set per message, not suitable for parsing

Option 5.2

simulator = rflx.Simulator(
                "Tlv.rflx",
                checksums={
                    'TLV': {
                         'Message': {
                               'checksum': lambda x: crc(x)
                         }
                    }
                }
            ).tlv.message

message = simulator.parse(data)

Pro: Checksum only set per simulator instance
Con: TLV.Message addressed in two distinct places / ways

Option 5.3

Cf. 1.5

Conclusion

5.3

Enumeration Literals Design

Option 6.1

if message.tag == 3:
   pass

Pro: Compatible with int
Con: Error prone, user needs to perform mapping from enum to integer manually

Option 6.2

if message.tag == simulator.tlv.msg_error:
   pass

Pro: Compatible with int (when based on IntEnum)
Con:

Conclusion

6.2

Summary - Message Parser

@rflx.simulator.from_file("tlv.rflx")
class MySimulator(rflx.simulator.Simulator):
    def tlv_message_checksum(x: int) -> int:
        return crc(x)

simulator = MySimulator()
simulator.tlv.message.parse(data)

tag = simulator.tlv.message.tag
value = simulator.tlv.message.value

if tag == simulator.tlv.msg_error:
   simulator.tlv.message.tag = new_tag

socket.send(bytes(simulator.tlv.message))

Related Work - State Machines

Pysmlib

https://darcato.github.io/pysmlib/docs/html/index.html

FiniteStateMachines

https://github.com/jaypantone/FiniteStateMachines

PythonStateMachine

https://python-statemachine.readthedocs.io/en/latest/index.html

Transitions

https://github.com/pytransitions/transitions

PySM

https://pysm.readthedocs.io/en/latest/#

StateEngine

https://github.com/aymanimtyaz/StateEngine

@senier senier added architectural decision Discussion of design decision simulator Related to simulator package (Python API) topic Large feature/change labels Dec 9, 2021
@senier senier self-assigned this Dec 9, 2021
@senier senier removed their assignment Jan 13, 2022
This was referenced Nov 29, 2022
@senier senier self-assigned this Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architectural decision Discussion of design decision simulator Related to simulator package (Python API) topic Large feature/change
Projects
None yet
Development

No branches or pull requests

1 participant