txth format specification #840

TimmRuppert · 2024-10-29T13:43:28Z

TimmRuppert
Oct 29, 2024
Collaborator

I a struggling with the .txth format. The documentation states:

*.txth
Human-readable plain-text trace file. Messages are separated by newlines.

I would understand this such that the new line character is only used to separate top-level messages. In that case the file would look something like the following - while it is not clearly defined how each line would actually look like:

somehow_readable_serialized_message1
somehow_readable_serialized_message1

If you use osi2read.py on a SensorView .osi you end up with something like this:

version {
  version_major: 3
  version_minor: 7
  version_patch: 0
}
timestamp {
  seconds: 0
  nanos: 100000000
}
sensor_id {
  value: 0
}
[...] // removed most part for readability
host_vehicle_id {
  value: 113
} // <------ This is where the first SensorView ends
version {
  version_major: 3
  version_minor: 7
  version_patch: 0
}
timestamp {
  seconds: 0
  nanos: 200000000
}
sensor_id {
  value: 0
} 
[...] // removed most part for readability
host_vehicle_id {
  value: 113
} // <------ This is where the second SensorView ends
version {
  version_major: 3
  version_minor: 7
  version_patch: 0
}
timestamp {
  seconds: 0
  nanos: 300000000
}
sensor_id {
  value: 0
}
[...] // removed most part for readability
host_vehicle_id {
  value: 113
} // <------ This is where the third SensorView ends
[...]

So the newline character is clearly also used within messages (to separate sub-messages and fields) and the number of lines may vary depending on the content (repeated fields etc.). The format of each top-level message seems to follow protbufs "reverse-engineered" Text Format Language Specification where "incompatibilities are likely to exist" between languages.

While this format might be useful for debugging, it is clearly a bit challenging to automatically parse/import as you do not clearly know when a new top-level message begins.

Ignoring the possible language incompatibility part stated by protobuf, one could look for the first line of the file to get the first unique key in the most outer level layer of the .txth file and use that as a separator (assuming order will not change based on different content). In the above example this would be the "version" key. I assume the order is maintained so it might be okay to assume it is always version for all OSI3 top-level message but I have not found anything about the order in protobufs Text Format Language Specification.

Is my understanding of txth correct?

In case my understanding is correct

Does anyone see a more robust way to separate messages while reading a .txth file?
Does anyone know if the order of field/sub-messages within a top-level message print will always be the same when using the built-in string/print conversion?
Shouldn't the OSI documentation mention how each message is formatted in a "Human-readable" way?
- e.g. mentioning protobufs "Text Format Language Specification"
If the .txth is not really used for something other than manual human inspection, shouldn't it be removed from the OSI spec in a future major version and marked deprecated in the near future?
- This does not imply that tools like osi2read.py could not continue to exist for debugging purposes.

@jdsika, @pmai this currently stops me a bit from finishing my PR for the osi-utilities, it would really help me if you could quickly clarify .txth a bit for me.

Answered by pmai

Oct 30, 2024

I a struggling with the .txth format. The documentation states:

*.txth
Human-readable plain-text trace file. Messages are separated by newlines.

I would understand this such that the new line character is only used to separate top-level messages.

If it would mean that it would state it as such, I would say.

The txth format was, as far as I can tell, quickly devised as a debugging aid - where one-way conversion from *.osi is sufficient, not necessarily as an easy to round-trip format. As such it is a bit lossy, and as you point out, not easy to parse back into the original data stream.

Would I have designed this format differently? Likely. Does this pose much of a problem currently: No…

View full answer

pmai · 2024-10-30T07:40:34Z

pmai
Oct 30, 2024
Maintainer

I a struggling with the .txth format. The documentation states:

*.txth
Human-readable plain-text trace file. Messages are separated by newlines.

I would understand this such that the new line character is only used to separate top-level messages.

If it would mean that it would state it as such, I would say.

The txth format was, as far as I can tell, quickly devised as a debugging aid - where one-way conversion from *.osi is sufficient, not necessarily as an easy to round-trip format. As such it is a bit lossy, and as you point out, not easy to parse back into the original data stream.

Would I have designed this format differently? Likely. Does this pose much of a problem currently: Not that people complain too much about.

If we came up with a better format in the future, with added requirements, one could think of deprecating the current txth format. Until that time I don't see the need to either deprecate or put too much effort into specifying the current format, as this will only lead people down a wrong path. Actually specifying a round-trippable human readable format is also a bit of work to get right, so while I can see use cases for this, I have yet to see the cost-benefit analysis to warrant the invested effort.

We might want to more plainly spell out the current state of affairs in the standard.

1 reply

TimmRuppert Oct 30, 2024
Collaborator Author

Thanks for clarifying my assumptions about the format! This helps me to proceed.

If it would mean that it would state it as such, I would say.

In an ideal world where documentation is always clear and flawless, yes ;) However, as I've mentioned, there’s no specific requirement for how the human-readable format of a message should appear within the file, which leaves some room for interpretation.

Until that time I don't see the need to either deprecate or put too much effort into specifying the current format, as this will only lead people down a wrong path. Actually specifying a round-trippable human readable format is also a bit of work to get right

In my opinion, developing a round-trippable human-readable format isn’t really worthwhile. Clearly, the txth format is primarily used for debugging, otherwise, I’d expect more people to have similar questions as I do. That said, I'd be happy to be proven wrong.

We might want to more plainly spell out the current state of affairs in the standard.

Totally agree. I’d suggest referencing the Protocol Buffers Text Format Language Specification and noting that this format is primarily intended for examining exported binary trace files, rather than serving as a fully round-trippable, human-readable format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASAM Open Simulation Interface (OSI)

txth format specification #840

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

ASAM Open Simulation Interface (OSI)

txth format specification #840

TimmRuppert Oct 29, 2024 Collaborator

Replies: 1 comment · 1 reply

pmai Oct 30, 2024 Maintainer

TimmRuppert Oct 30, 2024 Collaborator Author

TimmRuppert
Oct 29, 2024
Collaborator

Replies: 1 comment 1 reply

pmai
Oct 30, 2024
Maintainer

TimmRuppert Oct 30, 2024
Collaborator Author