Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve json formatting to improve readability #147

Open
Tisten opened this issue Oct 6, 2022 · 7 comments
Open

Improve json formatting to improve readability #147

Tisten opened this issue Oct 6, 2022 · 7 comments

Comments

@Tisten
Copy link
Contributor

Tisten commented Oct 6, 2022

When I compared dl to cap'n'proto, and the most striking thing cap'n'proto was better at was the awesome json formatting:
Example cap'n'proto:

    {"ptr": {"graphComponent": {"graph": {
      "nodes": [
        {"type": "tm_tick_event", "label": "", "positionX": -537.10931396484375, "positionY": -14.238007545471191, "settings": {"ptr": {}}},
        {"type": "tm_mixer_play_wav", "label": "", "positionX": -332.38427734375, "positionY": -171.35000610351562, "settings": {"ptr": {}}},
        {"type": "tm_mixer_set_pitch", "label": "", "positionX": 679.589111328125, "positionY": -23.874832153320312, "settings": {"ptr": {}}},
        {"type": "tm_vec3_length", "label": "", "positionX": -124.12257385253906, "positionY": 122.64999389648438, "settings": {"ptr": {}}},

And the same in dl:

      }, {
        "GraphComponent" : "ptr_488"
      }, {
...
        "ptr_488" : {
          "Graph" : "ptr_496"
        },
...
        "ptr_496" : {
          "Nodes" : [
          {
              "Type" : "tm_tick_event",
              "Label" : null,
              "PositionX" : -537.109314,
              "PositionY" : -14.2380075,
              "Width" : 0,
              "Settings" : {
                "AimConstraint" : null
              }
            }, {
              "Type" : "tm_mixer_play_wav",
              "Label" : null,
              "PositionX" : -332.384277,
              "PositionY" : -171.350006,
              "Width" : 0,
              "Settings" : {
                "AimConstraint" : null
              }
            }, {
              "Type" : "tm_mixer_set_pitch",
              "Label" : null,
              "PositionX" : 679.589111,
              "PositionY" : -23.8748322,
              "Width" : 0,
              "Settings" : {
                "AimConstraint" : null
              }
            }, {
              "Type" : "tm_vec3_length",
              "Label" : null,
              "PositionX" : -124.122574,
              "PositionY" : 122.649994,
              "Width" : 0,
              "Settings" : {
                "AimConstraint" : null
              }
@Tisten
Copy link
Contributor Author

Tisten commented Oct 6, 2022

Except for avoiding excess newlines, writing pointer payloads "inline" instead of in the end of the file make it much easier to read for a human.

@wc-duck
Copy link
Owner

wc-duck commented Oct 6, 2022

Personally I like the "excessive" newlines and I find that easier to read.
I however see your point on the pointers!
How do you represent a pointer placed in line if it is pointed to more than once?
Is the "ptr" : {} element some kind of marker and can have an ID?
And in the Cap'n proto data I don't see other pointer-references, just a list?

@Tisten
Copy link
Contributor Author

Tisten commented Oct 7, 2022

Cap'n'proto flattens (i.e removes) all pointers except AnyPointers (unions) when going to json, so circular references doesn't work at all and all data gets duplicated. So to keep the structural integrety when pointers are referenced from multiple places they still need to be identifiable, i.e have a unique name or tag, and the inlined data could be written in either all or just one of the places, e.g where the first reference to the data is. If the data is flattened and written in all places then #14 could solve deduplicating it. And even if you would go that "flatten everything" route, you would still need to abort on cyclic references and have an idetifier to refer to.

I guess the same thing is true for arrays, but since they are already written without a unique name I guess that dl already flattens them even if they refer to the same pointer?

The two main points of the newlines is that:

  1. I can often read the data of a whole game object on one screen, while in dl the graph object I looked at here took 6 screens instead of 2/3 of a screen (43 nodes took 387 lines) and thus required a lot of scrolling and mental load to memorize things. I tried making the font smaller but can still only fit 15 nodes (135 lines) before the text is unrealable.
    That said, it would help if the data of members were aligned, and I like your 32 bit float representation better.
  2. When inlining pointers (and arrays which already are inlined) the indentation can become huge, i.e an indentation tower which Eiffel would be envious of.

It would be awesome if json formating could be made using formatting rules similar to "clang-format", so each user can choose their own style. The more I think of it, the more I think that reformatting the json is something which can be done after DL have created the json, i.e by pipe:ing the data to another tool. So DL could just avoid writing any whitespace, and let the formatting tool add all that. It would be slower, but if the data could be piped in chunks then formatting could mostly be done in parallel with DL's json generation, so a GB file would not require twice the time.

@wc-duck
Copy link
Owner

wc-duck commented Oct 27, 2022

Yes, member-data alignment I wouldn't mind either. If what you mean with that is:

{
    "member_1" : 1234,
    "short"    : 3456
}

@wc-duck
Copy link
Owner

wc-duck commented Oct 27, 2022

also, I think vectors of numbers are single-line right? Because if they are not I think they should be.

@wc-duck
Copy link
Owner

wc-duck commented Oct 27, 2022

but as you say... formatting is highly highly personal, so being able to pipe it via some kind of formatter might be the best solution. However the current api do not support streaming output and I think it would require quite a bit of new api that would probably "break" the current API-structure.

But an "unformatted" json output, would that just be no newlines at all, basically just a big long single line?

@Tisten
Copy link
Contributor Author

Tisten commented Oct 27, 2022

Yes, arrays of primitives and pointers are always single line, even when they are epic in length.

And yes, you understood the data-alignment correctly.

In my mind the unformatted style is just without any whitespace/newlines at all, the smallest memory footprint to start the reformatting from, no need to strip whitespace before adding new.

The implementation used by cap'n'proto to make the formatting simple is to use a "string tree", where all elements are leafs in the tree and then parented by the lists and objects owning them. The branches can provide the summed length of all its children, making it trivial to know which lists are appropriate to keep in one line, and which elements to insert newlines and indentation between. It makes it easy to insert sub-strings into the tree while building it and can also reduce the memory footprint since identical strings can be reused instead of duplicated.

Unfortunately it is terribly modern code, very big interfaces, very few lines in implementation and utterly impossible to understand by reading. Source here:
https://github.com/capnproto/capnproto/blob/3b2e368cecc4b1419b40c5970d74a7a342224fac/c%2B%2B/src/kj/string-tree.h#L69
https://github.com/capnproto/capnproto/blob/3b2e368cecc4b1419b40c5970d74a7a342224fac/c%2B%2B/src/capnp/stringify.c%2B%2B#L57

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants