Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization merge #1060

Open
GreyCat opened this issue Jul 27, 2023 · 14 comments
Open

Serialization merge #1060

GreyCat opened this issue Jul 27, 2023 · 14 comments
Milestone

Comments

@GreyCat
Copy link
Member

GreyCat commented Jul 27, 2023

Following up #27, @generalmimon done tremendous job at implementing serialzation for some languages. However, it's not yet merged into master, thus so far these live into separate "serialization" branches:

It's documented in https://doc.kaitai.io/serialization.html, but ultimately our goal obviously would be not supporting two separate forks, but having one that does everything.

I wanted to start making itemized list of tasks of what exactly needs to be done to merge these back to master.

@generalmimon, can I ask you to share your thoughts on this?

@generalmimon
Copy link
Member

generalmimon commented Jul 27, 2023

It's documented in https://doc.kaitai.io/serialization.html, but ultimately our goal obviously would be not supporting two separate forks, but having one that does everything.

Of course. I'd love to do this - generally speaking, serialization in Python and Java is pretty much finished at this point and I believe the serialization branches are mostly ready for the merge. Let me assess the situation in individual repos:

  • kaitai_struct_tests: The aggregate/convert_to_json script (and related XML "parsers") has to be changed to read the test name not only from the test method name, but to take the combination of test class name + test method name as a unique test.

    This is needed mainly because in both Java and Python, serialization is tested primarily via a generic test method called testReadWriteRoundtrip / test_read_write_roundtrip, which is defined in the CommonSpec abstract base class. Unfortunately, from what I've seen, the convert_to_json script currently takes only the test method name as the name of the test, so different testReadWriteRoundtrip tests from all test formats are mistaken for just one test in the output ci.json file.

  • kaitai_struct_tests: Another thing to note here (but there's probably no need to do anything about it before the merge, it can be done after that) is that I've added ~60 new test formats during the serialization work, so these should eventually be generated from KST / ported to other languages. But I think our testing system doesn't care about additional test formats / KST specs without language-specific specs, so this can be done gradually for individual languages (no need to do it at once).

  • kaitai_struct_compiler: Since the serialization branch is up-to-date with the 0.10 tag, I don't anticipate almost any problems with the merge. The only feature I'm personally afraid of are the "zero-copy" substreams (Data views / substreams support #44), which I haven't thought of at all in the context of serialization. I think if we want to get serialization merged to master soon and avoid problems, for now I'd let the --read-write option imply --zero-copy-substream false (kaitai-io/kaitai_struct_compiler@690af1b).

    Or perhaps (but I don't know much about the substream implementation), the substreams could be only used for parsing, but serialization would still use the old way with copying for now.

  • kaitai_struct_java_runtime, kaitai_struct_python_runtime: Probably the least problematic. They could be perhaps merged right now because they actually only add APIs and the changes are independent on compiler changes, so the existing functionality should work as it did (at least I think).

@generalmimon
Copy link
Member

Also I've just extended the description of --read-write a bit to match the current situation, hopefully that's OK (JavaMain.scala:60-62):

      opt[Unit]('w', "read-write")  action { (x, c) =>
        c.copy(runtime = c.runtime.copy(readWrite = true, autoRead = false))
      } text("generate read-write support in classes (implies --no-auto-read, Java and Python only, default: read-only)")

@Letrab
Copy link

Letrab commented Jul 28, 2023

Are there some guidances how to support serialization for other languages? Especially interested in the JS one...
Happy to contribute where possible.

@izzyreal
Copy link

izzyreal commented Jul 29, 2023

I'm taking a look at

Slowly getting familiar with the tests submodule. Limiting myself to the 'java' flows.

Propagating class and method names into TestResult is trivial and we can get this kind of ci.json (actual output):

...
  "TestBcdUserTypeBe#testBcdUserTypeBe": {
    "status": "passed",
    "elapsed": 0.001
  },
  "TestBcdUserTypeBe#testReadWriteRoundtrip": {
    "status": "passed",
    "elapsed": 0.006
  },
...

However, tagging the results with is_kst: true is not so trivial. kst_adoption.log currently relies on congruence between filename and test name. Since serialization departs from this congruence, we'll have to tweak some kst_adoption machinery.

I was thinking we could generate kst_adoption_spec.log and kst_adoption_specwrite.log, instead of a single kst_adoption.log, by doing Dir.glob('spec/java/src/**/specwrite/Test*.java') and Dir.glob('spec/java/src/**/spec/Test*.java') in kst-adoption-report. Then we give TestResult an is_specwrite: Boolean field that is derived from the presence of specwrite in the FQN of the class name, and now convert_to_json#add_kst_adoption can tag accordingly.

What do you think?

@izzyreal
Copy link

izzyreal commented Jul 29, 2023

I haven't looked at the full pipeline, but since serialization affects test names in several ways, we may want to anticipate the implications for the order in which they appear on https://ci.kaitai.io/.

Moreover, perhaps I could start working on kaitai_ci_ui to prepare it for adding the write test results (@GreyCat mentioned this on Gitter). I've created a ticket to refine the layout in kaitai-io/kaitai_ci_ui#44.

@GreyCat
Copy link
Member Author

GreyCat commented Jul 30, 2023

I've took liberty at merging kaitai_struct_java_runtime — see kaitai-io/kaitai_struct_java_runtime@e34e5a9. A few very minor conflicts, otherwise all read-only tests work just as before.

@GreyCat
Copy link
Member Author

GreyCat commented Jul 30, 2023

Also merged https://github.com/kaitai-io/kaitai_struct_python_runtime — see kaitai-io/kaitai_struct_python_runtime@704995a.

No changes in tests, no conflicts.

@generalmimon
Copy link
Member

Also merged https://github.com/kaitai-io/kaitai_struct_python_runtime — see kaitai-io/kaitai_struct_python_runtime@704995a.

Seems a bit weird to me to do this as a fast-forward merge, but OK.

generalmimon added a commit to kaitai-io/kaitai_struct_compiler that referenced this issue Jul 31, 2023
+ let --read-write imply --zero-copy-substream=false, see
kaitai-io/kaitai_struct#1060 (comment)
@generalmimon
Copy link
Member

generalmimon commented Jul 31, 2023

I've just resolved the merge conflicts on branch https://github.com/kaitai-io/kaitai_struct_compiler/tree/serialization.

@GreyCat
Copy link
Member Author

GreyCat commented Aug 1, 2023

For tracking purposes — PRs to discuss the merge:

@yywing
Copy link

yywing commented Nov 22, 2023

How is it going? I wanner to do sth for go.

@yywing
Copy link

yywing commented Dec 11, 2023

There is a question: we should imply function to calc length?
image

meta:
  id: value_instances
seq:
  - id: len_data_raw
    type: u1
  - id: data
    size: len_data
instances:
  len_data:  # for read
    value: len_data_raw - 3
  calc_length:  # for write
    value: data.length + 3

calc_length for write.

ValueInstances r = new ValueInstances();

r.setData(new byte[] { 1, 2, 3, 4, 5 });
r.setLenDataRaw(r.calcLength());
System.out.println(r.lenData()); // => 5
  1. array length reduce: we cannot range array.
seq:
  - id: length
    type: u2be
  - id: params
    type: params
    size: len_params
instances:
  len_params:
    value: length - 2
  calc_length:
    value: params.calc_length + 2
types:
  params:
    seq:
      - id: params
        type: parameter
        repeat: eos
    instances:
      calc_length:
        value: params.reduce(0, r + i.calc_length)

  parameter:
    seq:
      - id: length
        type: u2be      
      - id: data
        size: len_data
    instances:
      len_data:
        value: length - 2
      calc_length:
        value: data.length + 2

@yywing
Copy link

yywing commented Dec 11, 2023

The implementation of Golang is almost complete, but there have been some breaking changes and some areas that do not meet my requirements. If no one responds to my questions, then I will have to implement it according to my own ideas. As a result, there is a high probability that it won't be merged into the master branch.

@dakhnod
Copy link

dakhnod commented May 10, 2024

What is the status here? Thanks for everyone working on this.
I would like to use Kaitai for numerous Gadgetbridge gadgets, and serialization is paramount there.

Is there any chance of serialization support in the main java package?
Maybe even in the web IDE?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants